我有一个csv格式的大型数据列表,我需要根据匹配的两个参数来删除行。
我要删除的数据列表如下所示:
London,James Smith
London,John Oliver
London,John-Smith-Harrison
Paris,Hermione
Paris,Trevor Wilson
New York City,Charlie Chaplin
New York City,Ned Stark
New York City,Thoma' Becket
New York City,Ryan-Dover
然后主csv将根据城市名称与第二列匹配删除一行,并将名称与第9列中的名称相匹配。
如果两者都匹配,则删除主csv中的行(注意此处未提供此csv示例)。
答案 0 :(得分:4)
我根据您提供/描述的数据类型验证了以下内容:
import csv
from cStringIO import StringIO
# parse the data you're about to filter with
with open('filters.csv', 'rb') as f:
filters = {(row[0], row[1]) for row in csv.reader(f, delimiter=',')}
out_f = StringIO() # use e.g. `with open('out.csv', 'wb') as out_f` for real file output
out = csv.writer(out_f, delimiter=',')
# go thru your rows and see if the pair (row[1], row[8]) is
# found in the previously parsed set of filters; if yes, skip the row
with open('data.csv', 'rb') as f:
for row in csv.reader(f, delimiter=','):
if (row[1], row[8]) not in filters:
out.writerow(row)
# for debugging only
print out_f.getvalue() # prints the resulting filtered CSV data
注意: {... for ... in ...}
是set-comprehension语法;根据您的Python版本,您可能需要将其更改为等效的set(... for ... in ...)
才能生效。
答案 1 :(得分:1)
如果第2列和第9列中的元素分别不在列表L1和L2中,您可以逐行读取数据并将行追加到列表中。
ext = "C:\Users\Me\Desktop\\test.txt"
readL = []
f = open(ext)
for line in f:
listLine = line.strip().split(',')
if(listLine[2] in L1 or listLine[9] in L2):
continue
readL += [listLine]
f.close()