"Text handling is an integral and indispensable part of any programming language. SPL (Structured .."

pjf RaqForum 22 No.
1 Reply • 442 View • 3 Years ago

SPL: Text Handling

Text handling is an integral and indispensable part of any programming language. SPL (Structured Process Language) provides abundant functions for handling text. There are four categories – character handling, simple string handling, sequence-related handling, and complex analysis. They cover extracting a substring from a string, type judgment and conversion, character set handling, data type parsing, search and replacement, format handling, string generation/splitting/combination, the use of regular expression matching, extracting words and numbers, SQL statement analysis, a HTML analysis, etc.

1. character handling

1.1 Type judgment

Task	Code	Return value
Check if all characters are numbers	isdigit(“12345”)	true
Check if all characters are letters	isalpha(“abcde”)	true
Check if all characters are lower-case letters	islower(“aBcd”)	false
Check if all characters are upper-case letters	isupper(“ABCD”)	true

1.2 Conversion

Task	Code	Return value
Convert all characters to lower-case	lower(“aB’cD’e”)	ab’cd’e
Skip a string put in quotes when converting to lower-case	lower@q(“aB’cD’e”)	ab’cD’e
Convert all characters to upper-case	upper(“aB’cD’e”)	AB’CD’E
Skip a string put in quotes when converting to upper-case	upper@q(“aB’cD’e”)	AB’cD’E
Convert row number and column number to an Excel-style cell name	cellname(8,32)	AF8

1.3 Character set handling

Task	Code	Return value
Get the Unicode value for the 2^nd character	asc(“abc”,2)	98
Get the Unicode value for the 1^st character	asc(“中国”)	20013
Get the corresponding character for the Unicode value	char(98)	B
Get the corresponding character for the Unicode value	char(22269)	国
Encode a URL string	urlencode("a=b+c>0?1:-1","UTF-8")	a%3Db%2Bc%3E0%3F1%3A-1
Decode an encoded URL string	urlencode@r(“a%3Db%2Bc”,”UTF-8”)	a=b+c
Base64-encode a byte array	base64(bytes)	ixb7x1dxfbxefxff
Convert to an MD5-encrypted string	md5(“abcde”)	AB56B4D92B40713ACC5AF89985D4B786

1.4 Data type parsing

Parse a string value into one of the corresponding data type.

Task	Code	Return value
Parse a numeric string into a number	number("1234.56")	1234.56
Parse a currency string into a number	number("$10,234.55","$#,###.#")	10234.55
Parse a time format string into a time object	parse("10:20:30")	A time object
Parse a sequence type string into a sequence	parse("[1,2,3]")	Sequence [1,2,3]
Remove quotes and escape characters at parsing	parse@e("\"a\tb\"")	a b
Only parse the numeric characters beginning from the left-most one	parse@n("123adsdx234")	123
Parse until the second half the quotation marks appears when a given string begins with the quotation marks	parse@q("\'sadsdxc\'+23")	sadsdxc

1.5 Format handling

Task	Code	Return value
Convert a numeric value to a corresponding format string	string(12345.678,”#,##0.00”)	12,345.68
Convert a date value to a corresponding format string	string(date(“2020-03-15”),”MM/dd/yyyy”)	03/15/2020
Convert a time value to a corresponding format string	string(time(“16:18:54”),”h:mm a”)	4:18 PM
Convert a datetime value to a corresponding format string	string(now(),”MM/dd/yyyy HH:mm”)	2020-05-28 14:06
Enclose a string with quotation marks	string@q(“ab cd”)	“ab cd”
Escape a character	string@e(“ab\”c d”)	ab\"c\td
Escape a character, and convert characters in a large character set to Unicode values	string@u(“ab\”c 中国”)	ab\"c\t\u4E2D\u56FD
Parse a format specifier substring of a string according to the second parameter	format("The price is $%.2f",8.5)	The price is $8.50

2. Simple string handling

2.1 Getting a substring

Get a substring from a source string. Related functions are mid(), left() and right(). Below are some examples:

Task	Code	Return value
Get a substring from the 3^rd character	mid(”abcdef”,3)	cdef
Get a substring of two characters from the 3^rd character	mid(”abcdef”,3,2)	cd
Get a substring of the leftmost three characters	left(”abcdef”,3)	abc
Get a substring by removing the last two characters	left(”abcdef”,-2)	abcd
Get a substring of the rightmost three characters	right(”abcdef”,3)	def
Get a substring by removing the first two characters	right(”abcdef”,-2)	cdef

substr(s1,s2)

The function searches source string s1 for the target substring s2, and returns the substring before or after s2 or null if s2 cannot be found.

Task	Code	Return value
Get the substring after a given substring	substr(“abCDcdef”,”cd”)	ef
Get the substring before a given substring	substr@l(“abCDcdef”,”cd”)	abCD
Get the case-insensitive substring after a given substring	substr@c(“abCDcdef”,”cd”)	cdef
Get a substring with a string put in quotes skipped	substr@q(“ab\”acd\”cdef”,”cd”)	ef

2.2 Search and matching

pos(s1,s2{,begin})

The function finds the position of the target substring s2 in source string s1, and returns the original position of s2 in s1 or null if s2 cannot be found.

Task	Code	Return value
The ordinary search	pos(”abcdef”,”aa”)	null
The ordinary search	pos(”abcDedefgh”,”de”)	6
Perform the search from the 5^th character	pos(”abcdedefgh”,”de”,5)	6
Perform the search backwards	pos@z(”abcdedefgh”,”de”)	6
Perform a case-insensitive search	pos@c(”abcDedefgh”,”de”)	4
Search for the target string at the beginning	pos@h(”abcdefgh”,”ab”)	1
Search for the target string at the tail end	pos@hz(”abcdefgh”,”fgh”)	6
Perform the search with a string put in quotes skipped	pos@q(”ab\”cde\”fcd”,”cd”)	9
Check if the source string matches a given format string	like("abc123", "abc*")	true
Case-insensitive for format string matching	like@c("abc123", "ABC*")	true

2.3 Replacement

replace(s,a,b)

The function replaces substring a in source string s with string b and returns the new s.

Task	Code	Return value
The ordinary replacement	replace("abc’ab’deA","a","ss")	ssbc'ssb’deA
Replace the first-found target substring only	replace@1("abc’ab’deA","a","ss")	ssbc'ab’deA
The case-insensitive replacement	replace@c("abc’ab’deA","a","ss")	ssbc'ssb’dess
Perform the replacement with a string put in quotes skipped	replace@q("abc’ab’deA","a","ss")	ssbc'ab’deA

2.4 Others

Task	Code	Return value
Repeat a given string n times to generate a new string	fill(“abc”,4)	abcabcabcabc
Delete white spaces at both ends of a given string	trim(" a bc ")	"a bc"
Delete white spaces on the left of a given string	trim@l(" a bc ")	"a bc "
Delete white spaces on the right of a given string	trim@r(" a bc ")	" a bc"
Delete unwanted white spaces from a given string	trim@a(" a bc ")	"a bc"
Patch a string before an existing one to extend the latter to the desired length	pad("Soth","Miss",10)	MissMiSoth
Patch a string after an existing one to extend the latter to the desired length	pad@r("Soth","er",8)	Sotherer
Generate a numeric string with the length of 1 randomly	rands(“0123456789”,11)	44238061662

3. Sequence-related handling

3.1 Splitting

s.split(d)

The function splits string s into a sequence using delimiter d, or into a sequence of single characters when d is absent.

Task	Code	Return value
Split a given string using the vertical line	“aa\|bb\|cc”.split(“\|”)	[aa,bb,cc]
Split a given string using the comma	"1,[a,b],(2,c),'5,6'".split@c()	[“1”,”[a,b]”,”(2,c)”,’5,6’]
Split a given string into two parts by the first-found delimiter	"1,[a,b],(2,c),'5,6'".split@c1()	[“1”,”[a,b],(2,c),’5,6’”]
Do not handle parentheses/brackets and quotation marks handling at splitting	"1,[a,b],(2,c),'5,6'".split@cb()	[“1”,”[a”,”b]”,”(2”,”c)”,”’5”,”6’”]
Parse each member in the result sequence as an object	"1,[a,b],(2,c),'5,6'".split@cp()	[1,[a,b],”(2,c)”,”5,6”]
Delete white spaces at both ends of each member in the result sequence	“192.168.0.3”.split@t(“.”)	[“192”,”168”,”0”,”3”]
Treat parameter d as a regular expression, and split the string, for instance, by digits	"a1b2c57d".split@r("\\d")	[“a”,”b”,”c”,””,”d”]
Split a given string by line break, and then perform the ordinary or the option-enabled splitting on each line	"s,a,y\ngood,morning".split@nc()	[[“s”,”a”,”y”],[“good”,”morning”]]

3.2 Concatenation

A.concat(d)

The function concatenates members of sequence A into a string through delimiter d, during which the sub-sequences are handled in the same way, or joins them up directly when d is absent.

In the following example functions, A1 is [1, ["a","b"],[2,"c"]].

Task	Code	Return value
Concatenate members without a delimiter	A1.concat()	1[ab][2c]
Concatenate members using the colon	A1.concat(“:”)	1:[a:b]:[2:c]
Concatenate members using the comma	A1.concat@c()	1,[a,b],[2,c]
Enclose each member with double quotation marks at concatenation	A1.concat@q()	1[“a””b”][2”c”]
Enclose each member with single quotation marks at concatenation	A1.concat@i()	1[‘a’’b’][2’c’]
With a two-level sequence, concatenate the outer level by line break and then perform concatenation on each sub-sequence	[[1,2,3],["a","b"],[2,"c"]].concat@n(“-“)	1-2-3 a-b 2-c
Concatenate parameters into a string and do not enclose each result members with quotes	concat(2,["a","b"],"cd")	2abcd

4. Complex analysis

4.1 Regular expression matching

Task	Code	Return value
Match a given string with the regular expression	“4,23,a,test”.regex("(\\d),([0-9]),([a-z]),([a-z])")	[“4”,”23”,”a”,”test”]
Match a given string with the regular expression	“4,23,a,test”.regex("(\\d),([a-z]),([0-9]),([a-z])")	null
Case-insensitive for the matching	"a:Test:B".regex@c("([a-z]):([a-z])")	[“a”,”T”,”t”,”B”]

Learn more about regular expressions.

4.2 Word and number extraction

In the following example functions, s is "hi-10 hello!2020-01-01A8,3.14".

Task	Code	Return value
Extract English words	s.words()	[”hi”,”hello”,”A”]
Extract digits strings	s.words@d()	[”10”,”2020”,”01”,”01”,”8”,”3”,”14”]
Extract English words and strings of digits	s.words@a()	[”hi”,”10”,”hello”,”2020”,”01”,”01”,”A”,”8”,”3”,”14”]
Extract all: words, strings of digits and other character strings	s.words@w()	[”hi”,”-”,”10”,” ”,”hello”,”!”,”2020”,”-”,”01”,”-”,”01”,”A”,”8”,”,”,”3”,”.”,”14”]
Extract all, where strings beginning with digits will be identified as numbers or datetimes as a whole	s.words@wp()	[”hi”,”-10”,” ”,”hello”,”!”,”2020-01-01”,”A”,”8”,”,”,”3.14”]
Treat continuous English letters and digits as a word	s.words@i()	[”hi”,”hello”,”A8”]

4.3 SQL statement analysis

sql.sqlparse(part)

The function splits a SQL query according to the order of select clause, from clause (including the join), group by clause, having clause and order by clause, make these syntactic units parameters members, and returns a sequence of them. When parameter part is present, replace the corresponding syntactic parameter in the source SQL query with it, and return a new SQL.

The sql in the following example functions is as follows:

select emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
    from EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
    where emp.EID<100 and dep.DEPT='R&D'
    order by emp.SALARY,emp.EID

sql2：select * from dept

Task	Code	Return value
Get all syntactic units of a given SQL query	sql.sqlparse()	Omitted; please refer to Function Reference
Get select clause	sql.sqlparse@s()	emp.EID,emp.NAME,emp.BIRTHDAY,emp.SALARY,dep.DEPT,dep.MANAGER
Get from clause	sql.sqlparse@f()	EMPLOYEE emp left join DEPARTMENT dep on emp.DEPT=dep.DEPT
Get where clause	sql.sqlparse@w()	emp.EID<100 and dep.DEPT='R&D'
Get group by clause	sql.sqlparse@g()	null
Get having clause	sql.sqlparse@h()	null
Get order by clause	sql.sqlparse@o()	emp.SALARY,emp.EID
Get syntactic units of a given SQL query and split each unit into a sequence of smaller ones	sql.sqlparse@a()	Omitted; please refer to Function Reference
Replace select clause with the parameter	sql2.sqlparse@s(“deptid”)	select deptid from dept
Replace from clause with the parameter	sql2.sqlparse@f(“sales”)	select * from sales
Convert a SQL query to the counterpart in a specified database	sql.sqltranslate(“MYSQL”)	Omitted

4.4 Node string handling

Node strings are key/value strings. In node strings in SPL, a key and its value are connected by the equals sign. Different nodes are separated by the white space.

xs.property(n,v)

The function reads value of the node named n from node string xs and returns the result. When string v is present, make v n’s value; and delete n when v is absent.

In the following example functions, xs is a set of key=value pairs: color=red size=20 price=500.

Task	Code	Return value
Get a two-column table sequence made up of all properties	xs.property()	Omitted
Get value of a given property	xs.property(“size”)	“20”
Get value of a given property and enclose it	xs.property@q(“size”)	“\”20\””
Get value of a given property and parse it into the corresponding data type	xs.property@v(“size”)	20
Modify value of a given property	xs.property(“size”,”30”)	color=red size=30 price=500
Delete a given property	xs.property(“size”,null)	color=red price=500
Get value of a given property form node strings separated by semicolon	“color=red;size=30“.property@cv(“size”)	30
Get value of a given property form node strings where the property name and value are comma-separated	“color:red size:30“.property@j(“size”)	“30”

4.5 HTML string handling

s.htmlparse(tag:i:j,…)

The function gets the j^th text under the i^th tag in HTML string s; or gets all text from s when all parameters are absent.

Below is the HTML string in the following example functions:

Task	Code	Return value
Get the 3^rd text under the 3^rd div tag	html.htmlparse("div":2:2)	ddd
Get the 1^st text under the 2^nd div tag and the 1^st text under the 1^st span tag	html.htmlparse("div":1:0,"span":0:0)	[aaa,ccc]
Get the content of the 1^st table tag	html.htmlparse("table":0)	[[1,2,3],[a,b,c]]

Learn more about text handling through string() functions.

SPL Official Website 👉 https://www.scudata.com

SPL Feedback and Help 👉 https://www.reddit.com/r/esProcSPL

SPL Learning Material 👉 https://c.scudata.com

SPL Source Code and Package 👉 https://github.com/SPLWare/esProc

Discord 👉 https://discord.gg/cFTcUNs7

Youtube 👉 https://www.youtube.com/@esProc_SPL

esProc

pjf • 442 View • 3 Years ago