Question

我正在尝试删除文件中以相同的5个字符开头的行，但是前5个字符是随机的（我不知道它们将是什么）吗？

我有一个代码读取文件第一行的最后5个字符，并将它们与文件中具有相同5个字符的随机行上的FIRST 5个字符进行匹配。问题是，当有两个或多个匹配项的前5个字符相同时，代码就会混乱。我需要读取文件中所有行并删除具有相同的5个首字符的两行之一的内容。

示例（问题）：

CCTGGATGGCTTATATAAGAT***GTTAT***

***GTTAT***ATAATATACCACCGGGCTGCTT

***GTTAT***ATAGTTACAGCGGAGTCTTGTGACTGGCTCGAGTCAAAAT

从文件中取出一个文件后，我需要什么：

CCTGGATGGCTTATATAAGAT***GTTAT***

***GTTAT***ATAATATACCACCGGGCTGCTT

（无第三行）

如果您能用言语解释我该如何做，我将不胜感激。

Answer 1

例如，您可以这样做：

FILE_NAME = "data.txt"                       # the name of the file to read in
NR_MATCHING_CHARS = 5                        # the number of characters that need to match

lines = set()                                # a set of lines that contain the beginning of the lines that have already been outputted
with open(FILE_NAME, "r") as inF:            # open the file
    for line in inF:                         # for every line
        line = line.strip()                  # that is
        if line == "": continue              # not empty
        beginOfSequence = line[:NR_MATCHING_CHARS]
        if not (beginOfSequence in lines):   # and the beginning of this line was not printed yet
            print(line)                      # print the line
            lines.add(beginOfSequence)       # remember that the beginning of the line

如何在python中删除以相同字符（但随机）开头的行？

1 个答案: