Python:简单的csv子串过滤器

时间:2016-05-10 14:21:34

标签: python csv

我想在csv文件的行中搜索子字符串。这就是我所拥有的。我知道它没有执行搜索,我没有正确写出输出。

import csv

def filterCSVfile (path):
    filterSubstrings = ['signal1', 'signal2']
    csvData = open (path)
    filereader = csv.reader(csvData, delimiter=',')

    rows = [row for row in filereader if row in filterSubstrings]

    outFileHandle = open("output.csv", "w")
    outFileHandle.write(rows)
    outFileHandle.close()

filterCSVfile('history.csv')

修改

csv文件包含两列,一列是人类可读的日期时间,另一列是网址,如:

2016-02-12 15:37:15,http://www.youtube.com/watch?v=wt60lVB8sHo
2016-02-12 15:37:15,https://www.youtube.com/watch?v=wt60lVB8sHo
2016-02-12 15:54:33,http://kizi.com/games/paintworld-2-monsters
2016-02-12 16:12:56,http://kizi.com/games/u/icycle
2016-02-12 16:13:03,http://kizi.com/games/u/iron-turtle
2016-02-12 16:13:46,http://www.armorgames.com/
2016-02-12 16:13:46,http://armorgames.com/

我想提取包含' signal1'的行。或者' signal2'在网址中,例如http://signal1.com

1 个答案:

答案 0 :(得分:0)

替换行

rows = [row for row in filereader if row in filterSubstrings]

with,

rows = [row for row in filereader if any([word in row[1] for word in filterSubstrings])]

源代码

import csv

def filterCSVfile(path):
    filterSubstrings = set(['signal1', 'signal2'])  # for efficency reason

    with open(path, 'r') as csvData:
        filereader = csv.reader(csvData, delimiter=',')
        rows = [row for row in filereader if any([word in row[1] for word in filterSubstrings])] # change this row

    with open('output.csv', 'w') as outFileHandle
        writer = csv.writer(outFileHandle)  # get a write object
        writer.writerows(rows)

filterCSVfile('history.csv')

<强>测试

history.csv

的内容
date1,http://signal1.com
2016-02-12 15:37:15,http://www.youtube.com/watch?v=wt60lVB8sHo
2016-02-12 15:37:15,https://www.youtube.com/watch?v=wt60lVB8sHo
date2,http://signal2.com

输出rows

[['date1', 'http://signal1.com'], ['date2', 'http://signal2.com']]