我试图删除长度约为3000万行的文本文件中的特定行(10884121)。这是我第一次尝试的方法,但是,当我执行它时运行大约20秒然后给我一个“内存错误”。有一个更好的方法吗?谢谢!
using System.Text;
using CefSharp;
using CefSharp.WinForms;
using CefSharp.Internals;
答案 0 :(得分:2)
首先,你没有使用进口;你试图写入输入文件,你的代码将整个文件读入内存。
这样的事情可能会减少麻烦 - 我们逐行阅读,
使用enumerate
计算行号;对于每一行,如果其编号不在忽略的行列表中,我们将其写入输出:
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
ignored_lines = [10884121]
with open(f_in, 'r') as fin, open(f_out, 'w') as fout:
for lineno, line in enumerate(fin, 1):
if lineno not in ignored_lines:
fout.write(line)
答案 1 :(得分:0)
请尝试使用:
import fileinput
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
f = open(f_out,'w')
counter=0
for line in fileinput.input([f_in]):
counter=counter+1
if counter != 10884121:
f.write(line) # python will convert \n to os.linesep, maybe you need to add a os.linesep, check
f.close() # you can omit in most cases as the destructor will call it
答案 2 :(得分:0)
由于您尝试将文件存储到列表中,因此很有可能内存不足。 请尝试以下方法:
import fileinput
import sys
f_in = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned2.txt'
f_out = 'C:\\Users\\Lucas\\Documents\\Python\\Pagelinks\\fullyCleaned3.txt'
_fileOne = open(f_in,'r')
_fileTwo = open(f_out,'w')
linenums = set([10884121])
for lineNumber, line in enumerate(_fileOne):
if lineNumber not in linenums:
_fileTwo.writeLine(line)
_fileOne.close()
_fileTwo.close()
这里我们逐行读取文件并排除不需要的行,这可能不会耗尽内存。 您也可以尝试使用缓冲来读取文件。 希望这会有所帮助。
答案 3 :(得分:0)
通用文件过滤器功能怎么样?
def file_filter(file_path, condition=None):
"""Yield lines from a file if condition(n, line) is true.
The condition parameter is a callback that receives two
parameters: the line number (first line is 1) and the
line content."""
if condition is None:
condition = lambda n, line: True
with open(file_path) as source:
for n, line in enumerate(source):
if condition(n + 1, line):
yield line
open(f_out, 'w') as destination:
condition = lambda n, line: n != 10884121
for line in file_filter(f_in, condition):
destination.write(line)