使用具有不同条目数量的行从csv文件创建列表

时间:2015-08-05 22:21:22

标签: python

我在csv文件中有数据,如下所示:

fromaddress, toaddress, timestamp
sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015
sender2@email.com, recipient1@email.com, 8-2-2015
sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015
sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015

使用Python,我想生成一个类似于:

的txt文件
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com
sender3_email.com, recipient3_email.com
sender3_email.com, recipient4_email.com
sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com

最终,我想这整个过程将需要几个步骤。在读取csv文件后,我将需要为fromaddress和toaddress创建单独的列表(我完全忽略了timestamp列)。 fromaddress列中每行只有一个电子邮件地址,但是toaddress列中每行有多个电子邮件地址。我需要为每行列出的每个toaddress电子邮件地址复制fromaddress电子邮件地址。完成后,我需要用下划线(_)符号替换所有@符号。最后,当我写txt文件时,我需要在每一行之间添加一个额外的空格,以便它是“双倍行距”

我还没有走得太远,因为我是一名Python新手,而且我已经陷入了第一步。以下代码复制了toaddress列中每个字符的fromaddress,而不是每个单独的电子邮件地址。我也需要有关toaddress列表的帮助。有人可以帮忙吗?

import csv
fromaddress = []
toaddress = []

with open("filename.csv", 'r') as f:
    c = csv.reader(f, delimiter = ",")
    for row in c:
        for item in row[1]:
            fromaddress.append(row[0]);

print(fromaddress)

每个人,感谢您的帮助!我尝试了所有代码,但不幸的是我没有得到我需要的输出。而不是得到这个(我想要的):

sender1_email.com, recipient1_email.com
sender1_email.com, recipient2_email.com
sender1_email.com, recipient3_email.com
sender2_email.com, recipient1_email.com
sender3_email.com, recipient1_email.com
sender3_email.com, recipient2_email.com

我得到了这个:

sender1_email.com,"recipient1_email.com, recipient2_email.com, recipient3_email.com"
sender2_email.com,"recipient1_email.com"
sender3_email.com,"recipient1_email.com, recipient2_email.com"

每个“fromaddress”行中只有1个元素,但每个“toaddress”行中有多个元素。基本上,我必须将每个收件人地址与正确的发件人地址配对。我想我没有得到正确的输出,因为csv文件中的(“)双引号包围了每一行中的所有发件人地址。

4 个答案:

答案 0 :(得分:0)

for row in c:   
    for item in row[1]:
        fromaddress.append(row[0]);

for item in row[1]只会查看每行中的第二个元素。如果要遍历每一行,然后将列元素分配给变量,您需要这样:

for row in c:   
    fromaddress.append(row[0]);
    toaddress.append(row[1]);
    # etc...

答案 1 :(得分:0)

这可以解决您的问题:

f = open("file.txt", "r")
output = open("output.txt", "w")

for line in f.readlines()[1:]: # splitting file into lines, excluding the first one
    fields = line.split(", ") # splitting line into separate fields

    for mail in fields[1:len(fields)-1]: # iterating on recepients (excluding first and last element from "fields" list)
        output.write(fields[0] + " " + mail + "\n")

f.close()
output.close()

答案 2 :(得分:0)

如果第一个电子邮件地址始终是发件人地址,并且连续的最后一项始终是无用值(日期),则可以执行以下操作。

test.txt输入文件

sender1@email.com, recipient1@email.com, recipient2@email.com, 8-1-2015
sender2@email.com, recipient1@email.com, 8-2-2015
sender3@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, recipient4@email.com, 8-3-2015
sender1@email.com, recipient1@email.com, recipient2@email.com, recipient3@email.com, 8-4-2015

python代码

import csv

storage = [] #here we store our sender-receiver pairs

with open('test.txt','rb') as f: #open the input text file
    for row in csv.reader(f, delimiter=','): #loop through every line of the input file
        storage.extend([(row[0].replace('@','_'),x.replace('@','_')) for x in row[1:-1]]) #add a sender-receiver pair to the storage list

#here we loop through the storage list and write line by line to  the `new.text` file.
for pair in storage:
    with open("new.txt", "a") as f:
        f.write(pair[0] + pair[1] + ' \n\n') #double new line!

输出new.txt

sender1_email.com recipient1_email.com 

sender1_email.com recipient2_email.com 

sender2_email.com recipient1_email.com 

sender3_email.com recipient1_email.com 

sender3_email.com recipient2_email.com 

sender3_email.com recipient3_email.com 

sender3_email.com recipient4_email.com 

sender1_email.com recipient1_email.com 

sender1_email.com recipient2_email.com 

sender1_email.com recipient3_email.com 

答案 3 :(得分:0)

您不需要创建单独的列表来执行您想要的操作,另外您可以使用csv模块读取输入csv文件编写txt输出文件:

import csv

with open('mail.csv','rb') as inf, open('mail.txt', 'wb') as outf:
    reader = csv.reader(inf)
    next(reader)  # skip header row
    writer = csv.writer(outf)
    for row in reader:
        row = row[:-1]  # remove trailing date
        sender = row[0].replace('@', '_')
        writer.writerows(([sender, recipient.replace('@', '_')]
                            for recipient in row[1:]))