Python删除特定的行号

时间:2016-03-24 19:41:33

标签: python

我试图删除长度约为3000万行的文本文件中的特定行(10884121)。这是我第一次尝试的方法,但是,当我执行它时运行大约20秒然后给我一个“内存错误”。有一个更好的方法吗?谢谢!

using System.Text;
using CefSharp;
using CefSharp.WinForms;
using CefSharp.Internals;

4 个答案:

答案 0 :(得分:2)

首先,你没有使用进口;你试图写入输入文件,你的代码将整个文件读入内存。

这样的事情可能会减少麻烦 - 我们逐行阅读, 使用enumerate计算行号;对于每一行,如果其编号不在忽略的行列表中,我们将其写入输出:

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'

ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
    for lineno, line in enumerate(fin, 1):
        if lineno not in ignored_lines:
            fout.write(line)

答案 1 :(得分:0)

请尝试使用:

import fileinput

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'

f = open(f_out,'w')

counter=0

for line in fileinput.input([f_in]):
    counter=counter+1
    if counter != 10884121:
          f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check

f.close() # you can omit in most cases as the destructor will call it

答案 2 :(得分:0)

由于您尝试将文件存储到列表中,因此很有可能内存不足。 请尝试以下方法:

import fileinput
import sys

f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
    if lineNumber not in linenums:
        _fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()

这里我们逐行读取文件并排除不​​需要的行,这可能不会耗尽内存。 您也可以尝试使用缓冲来读取文件。 希望这会有所帮助。

答案 3 :(得分:0)

通用文件过滤器功能怎么样?

def file_filter(file_path, condition=None):
    """Yield lines from a file if condition(n, line) is true.
       The condition parameter is a callback that receives two
       parameters: the line number (first line is 1) and the 
       line content."""

    if condition is None:
        condition = lambda n, line: True

    with open(file_path) as source:
        for n, line in enumerate(source):
            if condition(n + 1, line):
                yield line

open(f_out, 'w') as destination:
    condition = lambda n, line: n != 10884121

    for line in file_filter(f_in, condition):
        destination.write(line)