Question

我有一个文件（file1.txt），其中第一列包含字符串，我想在另一个文件（file2.txt）中过滤其字符串与列表'indref'完全对应的行（请参阅代码）。问题是生成的文件（参见简短示例）也会附加那些“开头”的字符串以及我要追加的值。我只想追加特定的字符串（'indref'中的字符串）。谢谢。

import numpy as np

indref = ['p1', 'p3']

with open('file1.txt') as oldfile, open('file2.txt', 'w') as newfile:

    for line in oldfile:
        if any(x in line for x in indref):
            newfile.write(line)

file1.txt

的示例

p1        4.252613E+01  
p2        4.245285E+01  
p3        4.272667E+01 
p4        4.255809E+01  
p5        4.284104E+01  
p6        4.292802E+01  
p7        4.295814E+01  
p8        4.286242E+01  
p9        4.286862E+01  
p10       4.258108E+01

FILE2.TXT：

p1        4.252613E+01  
p3        4.272667E+01 
p10       4.258108E+01

Answer 1

您使用split得到了一个很好的答案，但可以将其细化为

indref = ['p1', 'p3']

with open('file1.txt') as oldfile, open('file2.txt', 'w') as newfile:
    newfile.writelines(line for line in oldfile if line.split()[0] in indref)

Answer 2

您可以在每一行使用split()，然后检查第一个元素是否在indref中：

with open('test.txt') as f:
  indref = {'p1', 'p3'}
  data = [i for i in f.read().splitlines() if i.split()[0] in indref]

  with open('test2.txt', 'w') as f:
    f.write('\n'.join(data))

输出：

p1        4.252613E+01  
p3        4.272667E+01

我将indref更改为一个集合，因为集合中的查找是O（1）平均时间，如果它是一个非常大的列表，查找可能会很昂贵。

如何使用特定字符串过滤txt文件

2 个答案: