"【Question】 I have data in a csv file which looks like this: fromaddress, toaddress, timestamp se .."

blackduckie RaqForum 28 No.
389 View • 5 Years ago

CSV File Manipulation

text(125)

【Question】

I have data in a csv file which looks like this:

fromaddress, toaddress, timestamp

sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015

sender2@email.com, recipient1@email.com, 8-2-2015

sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015

sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015

Using Python, I would like to produce a txt file that looks like:

sender1_email.com, recipient1_email.com

sender1_email.com, recipient2_email.com

sender2_email.com, recipient1_email.com

sender3_email.com, recipient1_email.com

sender3_email.com, recipient2_email.com

sender3_email.com, recipient3_email.com

sender3_email.com, recipient4_email.com

sender1_email.com, recipient1_email.com

sender1_email.com, recipient2_email.com

sender1_email.com, recipient3_email.com

Ultimately, I imagine this whole process will take several steps. After reading in the csv file, I will need to create separate lists for fromaddress and toaddress (I am ignoring the timestamp column altogether). There is only 1 email address per row in the fromaddress column, however there are any number of email addresses per row in the toaddress column. I need to duplicate the fromaddress email address for each toaddress email address listed for each row. Once this done I need to replace all of the @ symbols with underscore (_) symbols. Finally, when I write the txt file, I need to add an extra space between each row so that it is "double-spaced"

I have not gotten very far as I'm a Python newbie and I'm stuck on the first step. The following code is duplicating the fromaddress for each individual character in the toaddress column instead of each individual email address. I also need help with the toaddress list as well. Can anyone help?

import csv

fromaddress = []

toaddress = []

with open("filename.csv", 'r') as f:

c = csv.reader(f, delimiter = ",")

for row in c:

for item in row[1]:

fromaddress.append(row[0]);

print(fromaddress)

Everyone, thanks for all of your help! I tried all your code but unfortunately I'm not getting the output I need. Instead of getting this (what I want):

sender1_email.com, recipient1_email.com

sender1_email.com, recipient2_email.com

sender1_email.com, recipient3_email.com

sender2_email.com, recipient1_email.com

sender3_email.com, recipient1_email.com

sender3_email.com, recipient2_email.com

I'm getting this:

sender1_email.com,"recipient1_email.com, recipient2_email.com, recipient3_email.com"

sender2_email.com,"recipient1_email.com"

sender3_email.com,"recipient1_email.com, recipient2_email.com"

There is only 1 element in each "fromaddress" row, but there are multiple elements in each "toaddress" row. Basically, I have to pair each recipient address with the correct sender address. I think I'm not getting the right output because of the (") double quotation marks in the csv file to surround all of the sender addresses in each row.

【Answer】

Retrieve data from the second row to the Nth row, and, for each row, make the first value the value of the first column and rotate values from the second to the second-to-last values of the second column to compose a multirow two-dimensional table. Meanwhile replace the sign "@" in the string with "_".

It’s complicated to perform so many set operations in Python. But it’s simple to get this done in SPL:

	A
1	=file("d:\\input.csv").read@n().(replace(~,"@","_"))
2	=A1.to(2,).(~.array())
3	=A2.news(~.to(2,~.len()-1);A2.~(1),~)
4	=file("d:\\result.txt").export@c(A3)

A1: Read in the csv file, join up values of each row into a string to generate a sequence, and replace "@" in the string with "_";

A2: Generate a new sequence with members from the second to the last in A1’s sequence, split members of the sequence into a new sequence, and finally return a sequence of sequences. This step retrieves and manipulates the data.

undefined