Find Strings Between Given Separators
【Question】
Hi friends, I want to ask a question:
How can I extract all text between two given strings str1 and str2? The returned results should be separated by something such as a space line.
For example, my "input.txt" is:
abdeb ..... str1 aaaaaaaaaaaaaaaaa str2 bbbbbbbbbbbb str1 cccccccccccccccccc str2 ddddddddddddddddd ....
My output.txt should be:
aaaaaaaaaaaaaaaaa
cccccccccccccccccc
Note that the text between str1 and str2 could be any length.
I think that a regular expression and egrep should be able to achieve this, but I am not familiar with them. Hope anyone can help. Thanks!
【Answer】
Regular expression matching is able to do this. But the method is not intuitive and so not easy to handle for beginners. Try SPL here. The Structured Process Language supports both regular expression matching and data processing with a function, such as pos (get a position) and mid (get a string by position). Below is the SPL script for your question:
A |
|
1 |
=file("D:\\input.txt").read().split("str2") |
2 |
=A1.select(pos(~,"str1")) |
3 |
=A2.(mid(~,pos(~,"str1")+4)) |
4 |
=file("D:\\output.txt").write(A3) |
A1: Read text as a string and split it into a sequence by “str2”.
A2: Get members containing substring “str1”.
A3: Get the substring after “str1” from each of A2’s members.
A4: Write A3’s result to output.txt.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL