我有一个看起来像
的输入csvemail,trait1,trait2,trait3
foo@gmail,biz,baz,buzz
bar@gmail,bizzy,bazzy,buzzy
foobars@gmail,bizziest,bazziest,buzziest
我需要输出格式看起来像
Indv,AttrName,AttrValue,Start,End
foo@gmail,"trait1",biz,,,
foo@gmail,"trait2",baz,baz,,
foo@gmail,"trait3",buzz,,,
对于输入文件中的每一行,我需要为输入csv中的N-1列写一行。在某些情况下,输出文件中的“开始”和“结束”字段可以为空。
我正在尝试使用DictReader
读取数据。因此,我已经能够使用
import unicodecsv
import os
import codecs
with open('test.csv') as csvfile:
reader = unicodecsv.csv.DictReader(csvfile)
outfile = codecs.open("test-write", "w", "utf-8")
outfile.write("Indv", "ATTR", "Value", "Start","End\n")
for row in reader:
outfile.write([row['email'],"trait1",row['trait1'],'',''])
outfile.write([row['email'],"trait2",row['trait2'],row['trait2'],''])
outfile.write([row['email'],"trait3",row['trait3'],'','')
哪个不起作用。 (我想我需要将列表转换为字符串),并且因为我正在为每一行的列名进行硬编码而非常脆弱。更大的问题是for循环中的数据不会写入"test-write"
。只有线
outfile.write("Indv", "ATTR", "Value", "Start","End\n")
实际写出来了。 DictReader是否适合在我的案例中使用?
答案 0 :(得分:3)
这会使用unicodecsv.DictWriter
和zip()
函数来执行您想要的操作,而且我认为代码可读性很强。
import unicodecsv
import os
import codecs
with open('test.csv') as infile, \
codecs.open('test-write.csv', 'w', 'utf-8') as outfile:
reader = unicodecsv.DictReader(infile)
fieldnames = 'Indv,AttrName,AttrValue,Start,End'.split(',')
writer = unicodecsv.DictWriter(outfile, fieldnames)
writer.writeheader()
for row in reader:
email = row['email']
trait1, trait2, trait3 = row['trait1'], row['trait2'], row['trait3']
writer.writerows([ # writes three rows of output from each row of input
dict(zip(fieldnames, [email, 'trait1', trait1])),
dict(zip(fieldnames, [email, 'trait2', trait2, trait2])),
dict(zip(fieldnames, [email, 'trait3', trait3]))])
以下是您从示例输入csv文件生成的test-write.csv
文件的内容:
Indv,AttrName,AttrValue,Start,End
foo@gmail,trait1,biz,,
foo@gmail,trait2,baz,baz,
foo@gmail,trait3,buzz,,
bar@gmail,trait1,bizzy,,
bar@gmail,trait2,bazzy,bazzy,
bar@gmail,trait3,buzzy,,
foobars@gmail,trait1,bizziest,,
foobars@gmail,trait2,bazziest,bazziest,
foobars@gmail,trait3,buzziest,,
答案 1 :(得分:2)
我可能完全离开,因为我没有用unicode做很多工作,但在我看来,以下应该有效:
import csv
with open('test.csv', 'ur') as csvin, open('test-write', 'uw') as csvout:
reader = csv.DictReader(csvin)
writer = csv.DictWriter(csvout, fieldnames=['Indv', 'AttrName',
'AttrValue', 'Start', 'End'])
for row in reader:
for traitnum in range(1, 4):
key = "trait{}".format(traitnum)
writer.writerow({'Indv': row['email'], 'AttrName': key,
'AttrValue': row[key]})
答案 2 :(得分:1)
import pandas as pd
pd1 = pd.read_csv('input_csv.csv')
pd2 = pd.melt(pd1, id_vars=['email'], value_vars=['trait1','trait2','trait3'], var_name='AttrName', value_name='AttrValue').rename(columns={'email': 'Indv'}).sort(columns=['Indv','AttrName']).reset_index(drop=True)
pd2.to_csv('output_csv.csv', index=False)
不清楚Start
和End
字段代表什么,但这可以为您提供其他所有内容。