从读取文件行中查找集合中的元素

时间:2018-07-10 08:20:40

标签: python python-2.7

我有带定界符|的文本文件:file1.txt

ID|Name|Date
1|A|2017-12-19   
2|B|2017-12-20
3|C|2017-12-21

然后跟随SET<type 'set'>

id_set = set(['1','2'])
date_set = set(['2017-12-19', '2017-12-20'])

我只想从set到file找到匹配的元素,并将该记录从file1.txt写入output.txt。

预期输出:Output.txt应该获得以下数据,

ID|Name|Date
1|A|2017-12-19   
2|B|2017-12-20

2 个答案:

答案 0 :(得分:3)

您可以尝试以下解决方案:

id_set = {'1','2'}
date_set = {'2017-12-19', '2017-12-20'}

# open files for reading and writing
with open('file.txt') as in_file, open('output.txt', 'w') as out_file:

    # write headers
    out_file.write(next(in_file))

    # go over lines in file
    for line in in_file:

        # extract id and date
        id, _, date = line.rstrip().split('|')

        # keep lines have an id or date in the sets
        if id in id_set or date in date_set:
            out_file.write(line)

其中提供以下 output.txt

ID|Name|Date
1|A|2017-12-19
2|B|2017-12-20

答案 1 :(得分:2)

如果您愿意使用第三方库,则可以使用熊猫:

import pandas as pd
from io import StringIO

mystr = StringIO("""ID|Name|Date
1|A|2017-12-19
2|B|2017-12-20
3|C|2017-12-21""")

# replace mystr with 'file1.txt'
df = pd.read_csv(mystr, sep='|')

# criteria
id_set = {'1', '2'}
date_set = {'2017-12-19', '2017-12-20'}

# apply criteria
df2 = df[df['ID'].astype(str).isin(id_set) | df['Date'].isin(date_set)]

print(df2)

#   ID Name        Date
# 0  1    A  2017-12-19
# 1  2    B  2017-12-20

# export to csv
df2.to_csv('file1_out.txt', sep='|')