Sample Programs of String Split in Excel
Though Excel functions are good at handling simple string split tasks, they become awkward for handling certain special string splits and those having complicated conditions. This article gives examples of the difficult tasks, explains the challenges and offers solutions in esProc SPL. SPL is the abbreviation of Structured Process Language used by esProc, the professional data computing engine. It’s convenient to handle structured data processing, including string split of course, with esProc.
I. Split by words
We have a product purchase list. In the list, each item includes the brand and type we want. What we want is to split each item to respectively list the brand and type after it.
Below is productlist.xlsx:
The expected result:
We can use Excel’s text-to-columns feature to do the splitting. During the process, both the brand string and type string will be split into multiple cells and then combined again. It seems that this is over-splitting and a waste of resource and time. We can also use flash fill in Excel to automatically fill the data according to a given pattern, to extract PANASONIC for the first row, for instance, and then get the type string using MID function. It’s still inconvenient.
SPL can read in an Excel file directly. To do a real-time analysis, it copies the Excel data onto the clipboard, uses the clipboard function to get the data, writes the processed data back to the clipboard and then pastes the result set to Excel. The whole process is smooth, seamless, convenient and efficient.
SPL script:
A |
B |
|
1 |
=clipboard().import@i() |
/ Import the product list directly form the clipboard |
2 |
=A1.(~.split@1("")) |
/ Split each item of the product list by the first space to generate a sequence of sequences |
3 |
=A2.concat@n("\t") |
/ Concatenate A2 into a string of values of two-dimensional table, where members in each subsequence are delimited by tab and member sequences are separated by carriage return |
4 |
=clipboard(A3) |
/ Put the string of values onto the clipboard |
After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.
II. Split away digits
There are strings made up of both digits and characters. We want to separate the digitss from the characters.
Below is numbers.xlsx:
The expected result:
To split away the digits only, we can use flash fill by defining a pattern on the first row. But to pick up the rest of the characters, there isn’t a way in Excel because some digits are mixed with the characters and MID function cannot play a role. The only way out is to write a program in VBA to loop over each character to check their type and split them one by one. The process is roundabout and complicated.
SPL can directly split a string into individual characters and categorize them by type:
A |
B |
|
1 |
=clipboard().split@n() |
/Get data from the clipboard, split it into a sequence by carriage return and then split each member character by character |
2 |
=A1.(~.align@a([true,false],isdigit(~)).(~.concat())) |
/Group each sequence of characters according to whether a character is a digit or not and concatenate each group into a string to split digits from the characters |
3 |
=A2.concat@n("\t") |
/Concatenate the two levels of sequences into a two-dimensional table string by tab and carriage return respectively |
4 |
=clipboard(A3) |
/ Paste A3’s string onto the clipboard |
III. Split away dates
Here are rows of words containing dates. We want to split away all dates from each row and separate the dates by semicolon there are more than one date.
Below is multidates.xlsx:
The expected result:
Excel hasn’t a special way to split a date away from a row, particularly when the number of dates in the rows varies. We can sue MID function or flash fill only when all dates have fixed positions or lengths in the rows. In our case, we have to turn to VBA to write a program and use a regular expression to match and analyze each row. This requires relatively high skills and produces complicated code.
SPL’s way is to split a string into a sequence of words by spaces and convert members according to the specified date format:
A |
B |
|
1 |
=clipboard().split@n(“ “) |
/ Get data from the clipboard, split it into a sequence by carriage return and then split each member into a sequence of words |
2 |
=A1.(~. (date(~,"dd.MM.yy"))) |
/ Convert members in the sequence of words into date type data according to the specified format |
3 |
=A2.(~.select(ifdate(~))) |
/ Get date type values from the sequence |
4 |
=clipboard(A3.concat@n(“;”)) |
/ Concatenate the date string into a two-dimensional data string and paste it onto the clipboard |
After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.
IV. Split by character
The following table contains a column of numbers of different lengths. We want to split each digit away from each numeric value to become an individual column.
Below is number.xlsx:
The expected result:
Text to column comes first for this case. But to split each character away requires specifying the splitting position each time a character is split away. It’s particularly tedious to do a lot of repeated specifications for a number with lots of digits. An alternative is to use functions to do the splitting, such as =MID($A1,COLUMN(A1),1) for B1, and then drag mouse to fill rows and columns. But it’s inconvenient to decide the maximum number of columns and to handle a table with lots of rows.
SPL can do the splitting by characters directly:
A |
B |
|
1 |
=clipboard().split@n() |
/ Get data from the clipboard, split it into a sequence by carriage return and then, by default, split each member into a sequence of individual characters |
2 |
=A1.concat@n("\t") |
/ Concatenate the two levels of sequences into a two-dimensional table string |
3 |
=clipboard(A2) |
/ Paste the result set onto the clipboard |
After the SPL script is executed, we just need to paste the result set onto B1 of the Excel file to get the desired result.
V. Split away attribute tables & file names
Here is a log file of complicated structure. It contains description of nodes that are similar to an attribute table. Now we want to split away the values of PublicKeyToken and the file names from the attribute description.
Below is log.xlsx:
The expected result:
To split such a comprehensive string to get a comma-separated description string at the first level and an attribute-node-like description string in Excel from the second item, we need a number of text-to-columns operations and multiple flash fills. There will be many steps and a given pattern is needed to do the flash fill. The whole process is extremely complicated.
SPL has the special function to directly get values of attribute strings and get different parts of a file name:
A |
B |
|
1 |
=clipboard().split@nc() |
/ Get data from the clipboard, split it into a sequence by carriage return and then, by default, split each member into a sequence by commas |
2 |
=A1.([replace(~(2),"\"","").property("PublicKeyToken"),filename(replace(~(3),"\"",""))]) |
/ Remove quotes at both ends, use property function to get PublicKeyToken value from item 2 and filename function to get the file name from the third item; then join the two values into a sequence |
3 |
=clipboard(A2.concat@n("\t")) |
/ Paste the result set onto the clipboard |
Find more examples in SPL Cookbook.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL