CSV File Manipulation
【Question】
I have data in a csv file which looks like this:
fromaddress, toaddress, timestamp
sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015
sender2@email.com, recipient1@email.com, 8-2-2015
sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015
sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015
Using Python, I would like to produce a txt file that looks like:
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
sender3_email.com, recipient3_email.com
sender3_email.com, recipient4_email.com
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
Ultimately, I imagine this whole process will take several steps. After reading in the csv file, I will need to create separate lists for fromaddress and toaddress (I am ignoring the timestamp column altogether). There is only 1 email address per row in the fromaddress column, however there are any number of email addresses per row in the toaddress column. I need to duplicate the fromaddress email address for each toaddress email address listed for each row. Once this done I need to replace all of the @ symbols with underscore (_) symbols. Finally, when I write the txt file, I need to add an extra space between each row so that it is "double-spaced"
I have not gotten very far as I'm a Python newbie and I'm stuck on the first step. The following code is duplicating the fromaddress for each individual character in the toaddress column instead of each individual email address. I also need help with the toaddress list as well. Can anyone help?
import csv
fromaddress = []
toaddress = []
with open("filename.csv", 'r') as f:
c = csv.reader(f, delimiter = ",")
for row in c:
for item in row[1]:
fromaddress.append(row[0]);
print(fromaddress)
Everyone, thanks for all of your help! I tried all your code but unfortunately I'm not getting the output I need. Instead of getting this (what I want):
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
I'm getting this:
sender1_email.com,"recipient1_email.com, recipient2_email.com, recipient3_email.com"
sender2_email.com,"recipient1_email.com"
sender3_email.com,"recipient1_email.com, recipient2_email.com"
There is only 1 element in each "fromaddress" row, but there are multiple elements in each "toaddress" row. Basically, I have to pair each recipient address with the correct sender address. I think I'm not getting the right output because of the (") double quotation marks in the csv file to surround all of the sender addresses in each row.
【Answer】
Retrieve data from the second row to the Nth row, and, for each row, make the first value the value of the first column and rotate values from the second to the second-to-last values of the second column to compose a multirow two-dimensional table. Meanwhile replace the sign "@" in the string with "_".
It’s complicated to perform so many set operations in Python. But it’s simple to get this done in SPL:
A |
|
1 |
=file("d:\\input.csv").read@n().(replace(~,"@","_")) |
2 |
=A1.to(2,).(~.array()) |
3 |
=A2.news(~.to(2,~.len()-1);A2.~(1),~) |
4 |
=file("d:\\result.txt").export@c(A3) |
A1: Read in the csv file, join up values of each row into a string to generate a sequence, and replace "@" in the string with "_";
A2: Generate a new sequence with members from the second to the last in A1’s sequence, split members of the sequence into a new sequence, and finally return a sequence of sequences. This step retrieves and manipulates the data.
A3: Get from each sequence member the members from the second to the second-to-last, arrange them in two columns, and return a new table sequence;
A4: Export A3’s result set to a comma separated text file.
SPL Official Website 👉 https://www.scudata.com
SPL Feedback and Help 👉 https://www.reddit.com/r/esProc_SPL
SPL Learning Material 👉 https://c.scudata.com
SPL Source Code and Package 👉 https://github.com/SPLWare/esProc
Discord 👉 https://discord.gg/cFTcUNs7
Youtube 👉 https://www.youtube.com/@esProc_SPL