Check Whether Certain Strings Appear in a Text File
【Question】
I’m digging Python sooo much. I love it!
I have come upon my first snag: scanning a file for some strings, and then correctly building separate arrays each of matched, unmatched, and “empty” strings, and then printing each array.
I’ve tried this several ways with several different Python file and sequence iteration constructs (6, I think), with both UTF-8 HTML and ASCII text files.
I have had mixed results - many positive - but none of file.read(), file.readline(), file.readlines(), or file.xreadlines() works as expected after opening the file for reading vai file = open(‘afile’, ‘r’).
I can get part of a given array built, and loop through, but for some reason, a given array is built only partially because the various read*() functions are not working as I have been expecting them to in the script properly….even after testing them successfully in the interpreter!
Code to follow soon, but basically:
tagsf = \[<'tagsfloat1>', '<tagsfloat2>'\]
tagso = \['<openingtag1>', '<openingtag2>',\]
tagsc = \['<closingtag1>', '<closingtag2>',\]
tagp = tagsf +tagso + tagsc
doc = open('afile.html', 'r')
page = \[\]
tagy = \[\]
tagn = \[\]
for lines in doc.read()
page.append(lines)
for line in page:
for tag in tagp:
if tag in tagy:
break
if tag in tagn:
break
if tag in line:
if tag not in tagy:
tagy.append(tag)
if tag not in tagn:
tagn.append(tag)
for tagyes in tagy:
print tagyes, 'found'
for tagno in tagn:
print tagno 'not found!'
All I ever get is the first tag found, or all tags NOT found!
【Answer】
The algorithm is simple. Read in the file as a large string, match it with a list of keywords tagall to get a sequence which is tagy, and then calculate the difference between tagall and tagy to get tagn. Besides the loop statement in Python, you can also use SPL (Structured Process Language) to do this. Below is the SPL script, which is simple and easy to understand:
A |
|
1 |
=[ "tagsfloat1","tagsfloat2","openingtag1","openingtag2"] |
2 |
=file("E:\\afile.html").read() |
3 |
=A1.select(pos(A2,~)) |
4 |
=A1\A3 |
A3 uses select() function to make a query over A1’s members by loop, and returns the matching ones. The condition is that whether a member in A1 is contained in A2’s large string. A1\A3 calculates the difference.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL