Python中的文本文件解析问题

时间:2011-08-24 02:03:36

标签: python parsing text-files

我是python的新手,如果找到“Lett”这个词,我试图删除文本文件中的行。在线。以下是我要解析的文本文件的示例:

<A>Lamb</A> <W>Let. Moxon</W>
<A>Lamb</A> <W>Danger Confound. Mor. w. Personal Deformity</W>
<A>Lamb</A> <W>Gentle Giantess</W>
<A>Lamb</A> <W>Lett., to Wordsw.</W>
<A>Lamb</A> <W>Lett., to Procter</W>
<A>Lamb</A> <W>Let. to Old Gentleman</W>
<A>Lamb</A> <W>Elia Ser.</W>
<A>Lamb</A> <W>Let. to T. Manning</W>

我知道如何打开文件,但我不知道如何找到匹配的文本,然后如何删除该行。任何帮助将不胜感激。

5 个答案:

答案 0 :(得分:4)

f = open("myfile.txt", "r")
for line in f:
  if not "Lett." in line: print line,

f.close()

或者如果要将结果写入文件:

f = open("myfile.txt", "r")
lines = f.readlines()
f.close()
f = open("myfile.txt", "w")
for line in lines:
  if not "Lett." in line: f.write(line)

f.close()

答案 1 :(得分:1)

# Open input text
text = open('in.txt', 'r')
# Open a file to output results
out = open('out.txt', 'w')

# Go through file line by line
for line in text.readlines():
    if 'Lett.' not in line: ### This is the crucial line.
        # add line to file if 'Lett.' is not in the line
        out.write(line)
# Close the file to save changes
out.close()

答案 2 :(得分:1)

我有这种东西的通用流编辑器框架。我将文件加载到内存中,将更改应用于内存中的行列表,并在更改时写出文件。

我的样板看起来像这样:

from sed_util import delete_range, insert_range, append_range, replace_range

def sed(filename):
    modified = 0

    # Load file into memory
    with open(filename) as f:
        lines = [line.rstrip() for line in f]

    # magic here...

    if modified:
        with open(filename, "w") as f:
            for line in lines:
                f.write(line + "\n")

# magic here部分,我有:

  1. 对各行的修改,例如:

    lines[i] = change_line(lines[i])

  2. 调用我的sed实用程序来插入,追加和替换行,例如:

    lines = delete_range(lines, some_range)

  3. 后者使用这样的原语:

    def delete_range(lines, r):
        """
        >>> a = list(range(10))
        >>> b = delete_range(a, (1, 3))
        >>> b
        [0, 4, 5, 6, 7, 8, 9]
        """
        start, end = r
        assert start <= end
        return [line for i, line in enumerate(lines) if not (start <= i <= end)]
    
    def insert_range(lines, line_no, new_lines):
        """
        >>> a = list(range(10))
        >>> b = list(range(11, 13))
        >>> c = insert_range(a, 3, b)
        >>> c
        [0, 1, 2, 11, 12, 3, 4, 5, 6, 7, 8, 9]
        >>> c = insert_range(a, 0, b)
        >>> c
        [11, 12, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        >>> c = insert_range(a, 9, b)
        >>> c
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 11, 12, 9]
        """
        assert 0 <= line_no < len(lines)
        return lines[0:line_no] + new_lines + lines[line_no:]
    
    def append_range(lines, line_no, new_lines):
        """
        >>> a = list(range(10))
        >>> b = list(range(11, 13))
        >>> c = append_range(a, 3, b)
        >>> c
        [0, 1, 2, 3, 11, 12, 4, 5, 6, 7, 8, 9]
        >>> c = append_range(a, 0, b)
        >>> c
        [0, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9]
        >>> c = append_range(a, 9, b)
        >>> c
        [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 11, 12]
        """
        assert 0 <= line_no < len(lines)
        return lines[0:line_no+1] + new_lines + lines[line_no+1:]
    
    def replace_range(lines, line_nos, new_lines):
        """
        >>> a = list(range(10))
        >>> b = list(range(11, 13))
        >>> c = replace_range(a, (0, 2), b)
        >>> c
        [11, 12, 2, 3, 4, 5, 6, 7, 8, 9]
        >>> c = replace_range(a, (8, 10), b)
        >>> c
        [0, 1, 2, 3, 4, 5, 6, 7, 11, 12]
        >>> c = replace_range(a, (0, 10), b)
        >>> c
        [11, 12]
        >>> c = replace_range(a, (0, 10), [])
        >>> c
        []
        >>> c = replace_range(a, (0, 9), [])
        >>> c
        [9]
        """
        start, end = line_nos
        return lines[:start] + new_lines + lines[end:]
    
    def find_line(lines, regex):
        for i, line in enumerate(lines):
            if regex.match(line):
                return i
    
    if __name__ == '__main__':
        import doctest
        doctest.testmod()
    

    为了清楚起见,测试适用于整数数组,但转换也适用于字符串数组。

    通常,我扫描行列表以识别我想要应用的更改,通常使用正则表达式,然后我将更改应用于匹配数据。例如,今天,我最终在150个文件中进行了大约2000行更改。

    当您需要应用多行模式或其他逻辑来确定更改是否适用时,这比sed更有效。

答案 3 :(得分:0)

如果'Lett'不在l]

,则返回[l for open in open(fname)

答案 4 :(得分:0)

result = ''
for line in open('in.txt').readlines():
    if 'lett' not in line:
        result += line
f = open('out.txt', 'a')
f.write(result)