Question

我有一个非常大的文本文件，我想过滤掉一些行。第一行是标识符，后跟许多行（不同行中的数字），如下例所示：

示例：

fixedStep ch=GL000219.1 start=52818 step=1
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
1.000000
fixedStep ch=GL000320.1 start=52959 step=1
1.000000
1.000000
1.000000
fixedStep ch=M start=52959 step=1
1.000000
1.000000

这一行是标识符：fixedStep ch=GL000219.1 start=52818 step=1 我想过滤掉包含ch=GL000219.1和ch=GL000320.1以及以下行（数字）的所有标识符行，并在其下方保留其他标识符和相应的行（数字）。每个标识符重复多次。喜欢这个输出：

fixedStep ch=M start=52959 step=1
1.000000
1.000000

我试过这段代码：

l = ["ch=GL000219.1", "ch=GL000320.1"] # since I have more identifiers that should be removed 

with open('file.txt', 'r') as f:
    with open('outfile.txt', 'w') as outfile:
        good_data = True
        for line in f:
            if line.startswith('fixedStep'):
                for i in l:
                    good_data = i not in line
            if good_data:
                outfile.write(line)

我的代码没有返回我想要的内容。你知道如何修改代码吗？

Answer 1

您将此行放在错误的位置：

good_data = True

一旦设置为false，它就不会再次成为现实。

您可以这样写：

l = ["ch=GL000219.1", "ch=GL000320.1"]
flag = False                                                                        

with open('file.txt', 'r') as f, open('outfile.txt', 'w') as outfile:                                                                                
    for line in f:                                                                  
        if line.strip().startswith("fixedStep"):                                    
            flag = all(i not in line for i in l)                                    
        if flag:                                                                    
            outfile.write(line)

Answer 2

从文本文件中读取字符串后，需要将字符串（文本文件的内容）拆分为行。使用

印刷（F）

读到f后，你会发现这是一个字符串而不是行。

如果它是unix结尾文本文件，请使用

f = f.split（＆＃34; \ n＆＃34;）

将字符串转换为列表，然后您可以按行循环。

在python中删除文本文件的某些部分

2 个答案: