我有一个包含随机数据的csv文件,但我想从文件中过滤数据。 我想过滤所有内容以$开头并以#
结尾的行2017-09-07 03:11:03,5,hello
2017-09-07 03:11:16,6,yellow
2017-09-07 03:11:22,28,some other stuff with spaces
2017-09-08 20:24:36,157,
2017-10-28 04:39:25,54,$SITE0011,1654,0000,0000,0000,00000000,000000^A^A^A^A^A^A^@^@#
2017-10-28 04:39:48,108,$SITE0011,1654,0000,0000,0000,00000000,000000^A^A^A^A^A^A^@^@#$SITE0011,1654,0000,0000,0000,00000000,000000^A^A^A^A^A^A^@^@#
2017-10-28 04:40:26,54,$SITE0011,1654,0000,0000,0000,00000000,000000^A^A^A^A^A^A^@^@#
2017-10-28 04:40:29,54,$SITE0011,1654,0000,0000,0000,00000000,000000^A^A^A^A^A^A^@^@#
答案 0 :(得分:2)
我认为这对于过滤生成器函数来说是一个很好的用例:
import re
import csv
def filter_lines(f):
"""this generator funtion uses a regular expression
to include only lines that have a `$` and end with a `#`.
"""
filter_regex = r'.*\$.*\#$'
for line in f:
line = line.strip()
m = re.match(filter_regex, line)
if m:
yield line
with open(CSV_FILENAME) as f:
filter_generator = filter_lines(f)
csv_reader = csv.reader(filter_generator)
for row in csv_reader:
pass
编辑:
我现在意识到,在你的例子中,单个“行”可以包含多个匹配(如第6行所示)。这个稍微修改过的版本也可以处理它:
import re
import csv
def filter_lines(f):
"""this generator funtion uses a regular expression
to include only lines that have a `$` and end with a `#`.
"""
filter_regex = r'(\$[^#]*\#)'
for line in f:
line = line.strip()
matches = re.findall(filter_regex, line)
for m in matches:
yield m
with open(CSV_FILENAME) as f:
filter_generator = filter_lines(f)
csv_reader = csv.reader(filter_generator)
for row in csv_reader:
print row
从示例输入生成的输出:
['$SITE0011', '1654', '0000', '0000', '0000', '00000000', '000000^A^A^A^A^A^A^@^@#']
['$SITE0011', '1654', '0000', '0000', '0000', '00000000', '000000^A^A^A^A^A^A^@^@#']
['$SITE0011', '1654', '0000', '0000', '0000', '00000000', '000000^A^A^A^A^A^A^@^@#']
['$SITE0011', '1654', '0000', '0000', '0000', '00000000', '000000^A^A^A^A^A^A^@^@#']
['$SITE0011', '1654', '0000', '0000', '0000', '00000000', '000000^A^A^A^A^A^A^@^@#']