我是新手,所以请放轻松我。我有一个100,000行CSV文件(mock_data.csv),其中我从keywords.csv中搜索关键字并打印出包含与新csv文件匹配的行(filename.csv)基于将基于其他代码的代码拼凑在一起stackoverflow上的消息线程,如果我改变代码以指定使用定位器查找关键字的行,我可以完成此任务。
我附加了新的模拟数据图像,它们应该更好地解释预期的结果,即返回mock_data文件的前5行(id_number在结尾处显示为false)
我遇到困难的是我试图在任何一行中找到关键字。我正在修补下面的代码行,但没有做到这一点。
with open('mock_data.csv', 'r') as infile:
reader = csv.reader(infile, delimiter = ',')[enter image description here][1]
for row in reader:
found = False
for keyword in allkeywords:
if keyword in row:
found = True
if found == True:
原始代码我必须在指定的行中查找关键字:
import csv
import time
import sys
from collections import defaultdict
import pandas as pd
#import keywords csv (1st col has keyword and 2nd col indicates keyword type)
columns = defaultdict(list)
with open('mock_keywords.csv','r') as f:
reader = csv.DictReader(map(lambda line:line.lower(),f),delimiter = ',')
for row in reader:
for (k,v) in row.items():
columns[k].append(v)
allkeywords = (columns['keywords'])
timestr = time.strftime("%Y-%m-%d-(%h-%M-%s)")
filename = 'output' +str(timestr)+'.csv'
csvout = open(filename, 'wb')
fieldnames =['id_number', 'country', 'cargo_description', 'supplier', 'transport', 'transport_id']
writer = csv.DictWriter(csvout, fieldnames = fieldnames)
writer.writeheader()
with open('mock_data.csv', 'r') as infile:
reader = csv.reader(infile, delimiter = ',')
for row in reader:
found = False
for keyword in allkeywords:
locator = row[2].lower().find(keyword)
if locator != -1:
found = True
if found == True:
writer.writerow({'id_number':row[0], 'country':row[1], 'cargo_description':row[2], 'supplier':row[3], 'transport':row[4], 'transport_id':row[5]})
csvout.close
#i imagine that when i look for keywords in any part of the csv that writer might write the same row multiple times.
#toclean = pd.read_csv(filename)
#deduped = toclean.drop_duplicates()
#deduped.to_csv(filename)
答案 0 :(得分:0)
这行代码错误,allkeywords
为空。
#Change from
allkeywords = (columns['keywords'])
#to
allkeywords = columns.keys()
不要将二进制'wb'
用于csv文件
更改为csvout = open(filename, 'w')
问题:...我正在尝试在任何行中查找关键字。我遇到麻烦的地方是,我认为我将把lotsofstuff.csv文件转换为小写
为什么不使用DictReader?
您可以使用标题名称columns
访问所有key
你可以一次写下整行。
for row in csv.DictReader(infile, delimiter = ','):
if row['id_number'] in allkeywords:
writer.writerow(row)
相应地编辑你的问题代码,回过头来弄明白你的意思“麻烦...小写”。