Question

我有一个文件A中的信息列表，我想根据文件B中的编号提取。如果给定值4和5，将提取文件A中值为4和5的所有第4列。我可以知道如何使用python执行此操作？谁能帮我？以下代码仅基于具有值4的索引进行提取。

with open("B.txt", "rt") as f:
    classes = [int(line) for line in f.readlines()]
    with open("A.txt", "rt") as f:
        lines = [line for index, line in enumerate(f.readlines()) if classes[index]== 4]
        lines_all= "".join(lines)

with open("C.txt", "w") as f:
        f.write(lines_all)

A.TXT

hg17_ct_ER_ER_1003  36  42  1
hg17_ct_ER_ER_1003  109 129 2
hg17_ct_ER_ER_1003  110 130 2
hg17_ct_ER_ER_1003  129 149 2
hg17_ct_ER_ER_1003  130 150 2
hg17_ct_ER_ER_1003  157 163 3
hg17_ct_ER_ER_1003  157 165 3
hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5
hg17_ct_ER_ER_1003  220 226 6

B.txt

4
5

所需的输出

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

Answer 1

从b文件中创建一组行/数字，将f1中每行的最后一个元素与集合中的元素进行比较：

import  csv    
with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    r = csv.reader(f,delimiter=" ")
    data = [row for row in r if row[-1] in st]
    print(data)

[['hg17_ct_ER_ER_1003', '179', '185', '4'], ['hg17_ct_ER_ER_1003', '197', '217', '5']]

将delimiter=设置为它是什么，或者如果您的文件以逗号分隔，则根本不设置它。

或者：

with open("a.txt") as f, open("b.txt") as f2:
    st = set(line.rstrip() for line in f2)
    data = [line.rstrip() for line in f if line.rsplit(None, 1)[1] in st ]
    print(data)
['hg17_ct_ER_ER_1003 179 185 4', 'hg17_ct_ER_ER_1003 197 217 5']

Answer 2

with open("B.txt", "r") as target_file:
    target = [i.strip() for i in target_file]

with open("A.txt", "r") as data_file:
    r = filter(lambda x: x.strip().rsplit(None, 1)[1] in target, data_file)

print "".join(r)

输出：

hg17_ct_ER_ER_1003  179 185 4
hg17_ct_ER_ER_1003  197 217 5

作为mentioned由@Padraic提供，我将split()[-1]更改为rsplit(None, 1)[1]。

使用Python基于文本文件中的值提取行

2 个答案: