Question

我必须在txt.file中找到一个字符串列表

该文件有200k +行

这是我的代码：

with open(txtfile, 'rU') as csvfile:
    tp = pd.read_csv(csvfile, iterator=True, chunksize=6000, error_bad_lines=False,
                     header=None, skip_blank_lines=True, lineterminator="\n")
    for chunk in tp:
        if string_to_find in chunk:
            print "hurrà"

问题是这个代码只分析前9k行。为什么呢？

Answer 1

你真的需要先打开文件然后使用熊猫吗？如果是一个选项，你可以用pandas阅读concatenate。

为此，只需使用read_csv，concat文件，然后循环播放。

import pandas as pd

df = pd.read_csv('data.csv', iterator=True, chunksize=6000, error_bad_lines=False,
                 header=None, skip_blank_lines=True)
df = pd.concat(df)

# start the for loop

这取决于你的for循环，pandas很可能会有一个你不需要循环的函数，因为它处理大数据的速度较慢。

在巨大的字符串文件中查找字符串

1 个答案: