第一列是唯一的,只有一个条目。接下来的各列中都有多个换行符,我希望将所有内容放在单独的行中。
此文件中还有大约50,000行需要循环通过
我目前拥有的
Type, Animal, Age
Animals,Dog\nZebra\nPanda\nBear,40\n26\n18\n59
我的目标
Type, Animal, Age
Animals,Dog,40
Animals,Zebra,26
Animals,Panda,18
Animals,Bear,59
老实说,我不知道从哪里开始,希望有人可以带领我朝正确的方向前进。希望能够用某种功能强大的外壳来完成它,但是对任何东西都开放。
答案 0 :(得分:0)
import itertools
raw = 'Animals,Dog\nZebra\nPanda\nBear,40\n26\n18\n59'
categories = raw.split(',')
result = zip(itertools.repeat(categories[0]),
categories[1].split('\n'),
categories[2].split('\n'))
print(result) # >>> [('Animals', 'Dog', '40'), ('Animals', 'Zebra', '26'), ('Animals', 'Panda', '18'), ('Animals', 'Bear', '59')]
一些假设:
答案 1 :(得分:0)
对文件中的每一行执行.split(',')
,然后遍历创建为的列表:
for i in list:
i.split('\n')
现在您应该有一个看起来像这样的列表:
list_line1 = [Type, Animal, Age]
list_line2 = [Animals,[Dog,Zebra,Panda,Bear],[40,26,18,59]]
这样,您可以更轻松地遍历列表...所以您只需在其中进行遍历,然后按自己喜欢的方式保存即可!
for animal in list_line2[1]:
save the way you like it here!
我希望这会有所帮助
答案 2 :(得分:0)
由于您的原始csv不会在字段两边加上引号,因此文件需要以newline='\r\n'
打开,因此只有\r\n
被视为换行符,\n
被单独对待不是:
import csv
from itertools import repeat
# assuming lines looks like
# Type, Animal, Age\r\n
# Animals,Dog\nZebra\nPanda\nBear,40\n26\n18\n59\r\n
# specifically set newlines to '\r\n'
with open('file.csv', 'r', newline='\r\n') as fin:
with open('new_file.csv', 'w', newline='') as fout:
writer = csv.writer(fout)
for line in fin:
# manually split row
row = line.rstrip().split(',')
for newrow in zip(repeat(row[0]), row[1].split('\n'), row[2].split('\n')):
writer.writerow(newrow)
如果正确引用了原始csv,则您的代码将如下所示:
import csv
from itertools import repeat
# assuming lines looks like
# Type, Animal, Age
# Animals,"Dog\nZebra\nPanda\nBear","40\n26\n18\n59"\r\n
with open('file.csv', 'r', newline='') as fin:
with open('new_file.csv', 'w', newline='') as fout:
reader = csv.reader(fin, delimiter=',')
writer = csv.writer(fout, delimiter=',')
for row in reader:
for newrow in zip(repeat(row[0]),
row[1].split('\n'),
row[2].split('\n')):
writer.writerow(newrow)