如何扫描阅读器csv以获取reader2 csv中的任何项目,并返回带有匹配信息的新csv。
66740,1800,1001463,1467373,896159
1001385|NORTHWEST PIPE CO|10-Q|2015-05-06|edgar/data/1001385/0001193125-15-174814.txt
1001426|PERICOM SEMICONDUCTOR CORP|10-Q|2015-05-05|edgar/data/1001426/0001145443-15-000628.txt
1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001386.txt
1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001394.txt
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001388.txt
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001390.txt
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001392.txt
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001396.txt
with open('newCIK.csv') as reader2:
reader2 = csv.reader(reader2)
with open('search.file') as f_in, open('SP500_10K.csv', 'w') as f_out:
reader = csv.reader(f_in, delimiter='|')
writer = csv.writer(f_out, delimiter='|')
for line in reader:
for cik in reader2:
if cik in line:
writer.writerow(line)
答案 0 :(得分:1)
您正在尝试将文件对象视为列表,并多次循环它。如果不做额外的工作,这将无法奏效。而且,你没有遍历一行的列;您正在尝试测试整行是否在其他CSV文件行中。您需要测试每个值,然后只测试search.file
CSV数据中行的最后一列。
文件对象具有文件位置;当你从文件中读取时,位置从开始到结束。一旦到达,它将不会自动再次启动。
您可以再次查找文件对象:
with open('newCIK.csv') as reader2_file:
reader2 = csv.reader(reader2_file)
with open('search.file') as f_in, open('SP500_10K.csv', 'w') as f_out:
reader = csv.reader(f_in, delimiter='|')
writer = csv.writer(f_out, delimiter='|')
for line in reader:
reader2_file.seek(0) # rewind to the start
for cik in reader2:
if cik in line:
writer.writerow(line)
但是,一遍又一遍地读取文件慢。你最好在开始时将整个内容读入内存。上面没有解决另一个问题,即您正在测试newCIK.csv
中的每一行而不是每一列。
将 one 行读入内存,然后循环遍历:
with open('newCIK.csv', newline='') as reader2:
reader2 = csv.reader(reader2)
cik_values = next(reader2) # first row
with open('search.file', newline='') as f_in, open('SP500_10K.csv', 'w', newline='') as f_out:
reader = csv.reader(f_in, delimiter='|')
writer = csv.writer(f_out, delimiter='|')
for line in reader:
for cik in cik_values:
if cik in line[-1]: # test only the last column
writer.writerow(line)
请注意,我在newline=''
个调用中添加了open()
个参数; csv
模块需要更多地控制换行符;如果不这样做可能会导致Windows上出现问题以及处理包含换行符的值时出现问题。
演示:
>>> from io import StringIO
>>> import csv, sys
>>> newcik = '''\
... 66740,1800,1001463,1467373,896159
... '''
>>> search_file = '''\
... 1001385|NORTHWEST PIPE CO|10-Q|2015-05-06|edgar/data/1001385/0001193125-15-174814.txt
... 1001426|PERICOM SEMICONDUCTOR CORP|10-Q|2015-05-05|edgar/data/1001426/0001145443-15-000628.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001386.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001394.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001388.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001390.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001392.txt
... 1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001396.txt
... '''
>>> with StringIO(newcik) as reader2:
... reader2 = csv.reader(reader2)
... cik_values = next(reader2) # first row
...
>>> with StringIO(search_file) as f_in:
... reader = csv.reader(f_in, delimiter='|')
... writer = csv.writer(sys.stdout, delimiter='|')
... for line in reader:
... for cik in cik_values:
... if cik in line[-1]: # test only the last column
... writer.writerow(line)
...
1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001386.txt
103
1001463|Acacia Diversified Holdings, Inc.|10-K|2015-05-20|edgar/data/1001463/0001185185-15-001394.txt
103
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001388.txt
103
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001390.txt
103
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001392.txt
103
1001463|Acacia Diversified Holdings, Inc.|10-Q|2015-05-20|edgar/data/1001463/0001185185-15-001396.txt
103
103
个数字是每个writer.writerow()
调用中写入的字节数,由REPL回显。