如何使用Python 2.7在多列CSV文件中查找关键字

时间:2017-04-20 14:41:16

标签: python csv

我是新手,所以请放轻松我。我有一个100,000行CSV文件(mock_data.csv),其中我从keywords.csv中搜索关键字并打印出包含与新csv文件匹配的行(filename.csv)基于将基于其他代码的代码拼凑在一起stackoverflow上的消息线程,如果我改变代码以指定使用定位器查找关键字的行,我可以完成此任务。

我附加了新的模拟数据图像,它们应该更好地解释预期的结果,即返回mock_data文件的前5行(id_number在结尾处显示为false)

我遇到困难的是我试图在任何一行中找到关键字。我正在修补下面的代码行,但没有做到这一点。

with open('mock_data.csv', 'r') as infile:
    reader = csv.reader(infile, delimiter = ',')[enter image description here][1]
    for row in reader:
        found = False
        for keyword in allkeywords:
            if keyword in row:
                found = True
            if found == True:

原始代码我必须在指定的行中查找关键字:

import csv
import time
import sys
from collections import defaultdict
import pandas as pd

#import keywords csv (1st col has keyword and 2nd col indicates keyword type)
columns = defaultdict(list)
with open('mock_keywords.csv','r') as f:
   reader = csv.DictReader(map(lambda line:line.lower(),f),delimiter = ',')
   for row in reader:
       for (k,v) in row.items():
           columns[k].append(v)
allkeywords = (columns['keywords'])

timestr = time.strftime("%Y-%m-%d-(%h-%M-%s)")
filename = 'output' +str(timestr)+'.csv'

csvout = open(filename, 'wb')
fieldnames =['id_number', 'country', 'cargo_description', 'supplier', 'transport', 'transport_id']
writer = csv.DictWriter(csvout, fieldnames = fieldnames)
writer.writeheader()

with open('mock_data.csv', 'r') as infile:
    reader = csv.reader(infile, delimiter = ',')
    for row in reader:
        found = False
        for keyword in allkeywords:
            locator = row[2].lower().find(keyword)
            if locator != -1:
                found = True
            if found == True:
                writer.writerow({'id_number':row[0], 'country':row[1], 'cargo_description':row[2], 'supplier':row[3], 'transport':row[4], 'transport_id':row[5]})
csvout.close

#i imagine that when i look for keywords in any part of the csv that writer might write the same row multiple times.
#toclean = pd.read_csv(filename)
#deduped = toclean.drop_duplicates()
#deduped.to_csv(filename)

1 个答案:

答案 0 :(得分:0)

  1. 这行代码错误,allkeywords 为空

    #Change from  
        allkeywords = (columns['keywords'])
    #to  
        allkeywords = columns.keys()  
    
  2. 不要将二进制'wb'用于csv文件

    更改为csvout = open(filename, 'w')

  3.   

    问题:...我正在尝试在任何行中查找关键字。我遇到麻烦的地方是,我认为我将把lotsofstuff.csv文件转换为小写

    为什么不使用DictReader?
    您可以使用标题名称columns访问所有key 你可以一次写下整行。

    for row in csv.DictReader(infile, delimiter = ','):
        if row['id_number'] in allkeywords:
            writer.writerow(row)
    

    相应地编辑你的问题代码,回过头来弄明白你的意思“麻烦...小写”