应用错误收集

我正在尝试将正则表达式放在500MB的文本文件中：

with open(f'{trn_path}text.00') as file:
       df_trn = file.readlines() 

re1 = re.compile(r'  +')
def fixup(x):
    x.replace('>', '').replace('<', '').replace('doc', '').replace(':', 
    '').replace('t_up', '').replace(
        'fi.wikipedia.org', '').replace('url="https', '').replace('\xa0','')
       return re1.sub(' ', html.unescape(x))

df_trn = pd.Series(df_trn)
df_trn = df_trn.apply(fixup).values.astype(str)

给我

MemoryError:

如果我读取文件chunksize

df_trn.to_csv(f'{trn_path}train.csv', header=False, index=False)
df_trn = pd.read_csv(f'{trn_path}train.csv', header=None, chunksize=chunksize)

我得到了

AttributeError: 'TextFileReader' object has no attribute 'apply'

有什么想法吗？谢谢！

Pandas .apply MemoryError和'TextFileReader'对象没有属性'apply'

0 个答案: