我有一个从h5文件导出的访谈记录的CSV文件。当我将行读入python时,输出看起来像这样:
line[0]=['title,date,responses']
line[1]=['[\'Transcript 1 title\'],"[\' July 7, 1997\']","[ '\nms. vogel: i look at all sectors of insurance, although to date i\nhaven\'t really focused on the reinsurers and the brokers.\n']']
line[2]=['[\'Transcript 2 title\'],"[\' July 8, 1997\']","[ '\nmr. tozzi: i formed cambridge in 1981. we are top-down sector managers,\nconstantly searching for non-consensus companies and industries.\n']']
etc...
我想从"回复"中提取文字。列只能列入CSV文件中每行的单独.txt文件,将.txt文件保存到指定的目录中并将其命名为" t1.txt"," t2.txt",等根据行号。 CSV文件大约有30K行。
借鉴我已经能够在网上找到的内容,这是我到目前为止的代码:
import csv
with open("twst.csv", "r") as f:
reader = csv.reader(f)
rownumber = 0
for row in reader:
g=open("t"+str(rownumber)+".txt","w")
g.write(row)
rownumber = rownumber + 1
g.close()
我最大的问题是,这会将行中的所有列拉到.txt文件中,但我只想要来自"响应"柱。有了这个,我知道我可以遍历文件中的各个行(现在,我设置的只是为了测试第一行),但是我还没有找到任何关于提取特定列的指导python文档。我也不熟悉python来自己弄清楚代码。
提前感谢您的帮助!
答案 0 :(得分:1)
内置的csv模块可能有些功能。但是,如果csv的格式没有改变,那么下面的代码应该只使用for循环和内置的读/写。
with open('test.csv', 'r') as file:
data = file.read().split('\n')
for row in range(1, len(data)):
third_col= data[x].split(',')
with open('t' + str(x) + '.txt', 'w') as output:
output.write(third_col[2])