9.23 Parse HTML file and analyze data
Parse an HTML file and analyze data in it.
Find all numbers in the body an HTML file. Below is part of the file:
<!DOCTYPE html>
<html class="html__responsive html__unpinned-leftnav">
<head>
<title>Stack Overflow - Where Developers Learn, Share, & Build Careers</title>
<link rel="shortcut icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/favicon.ico?v=ec617d715196">
<link rel="apple-touch-icon" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
<link rel="image_src" href="https://cdn.sstatic.net/Sites/stackoverflow/Img/apple-touch-icon.png?v=c78bd457575a">
…
</html>
SPL has s.htmlparse() function to get all text of an HTML file.
SPL script:
A | |
---|---|
1 | =file(“sof.html”).read() |
2 | =A1.htmlparse() |
3 | =A2.(~.words@d()).conj() |
A1 Read the HTML file.
A2 Use htmlparse() function to parse the html strings and return a sequence of text strings.
A3 Loop through the parsed text sequence to get the number in each string and concatenate these numbers.
Execution result:
Members |
---|
30 |
3 |
16.5 |
5 |
… |
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL