我正在使用python的多处理进程池模块从目录中读取几个制表符分隔的文件,基于提供的答案here。直到现在,答案对我有用,但突然间它停止了工作。我知道这听起来很愚蠢,但我已经搜索了所有可能的解决方案,但仍然无法弄明白。我的代码如下:
def reading(path):
return pd.read_csv(path, sep='\t', header=None,quoting=csv.QUOTE_NONE,encoding='utf-8',
converters={13:str})
def main():
file_list = []
# set up your pool
pool = mp.Pool(processes=8) # or whatever your hardware can support
# get a list of file names
for root, dirs, files in os.walk('c:/Users/kdalal/contentengine/IngestionTrain/Raw Logs'):
for file in files:
if file.startswith('2') and os.stat(os.path.join(root, file)).st_size != 0:
print(os.path.join(root, file))
file_list.append(os.path.join(root, file))
# have your pool map the file names to dataframes
df_list = pool.map(reading,file_list)
print("Pooling Done")
# reduce the list of dataframes to a single dataframe
df = pd.concat(df_list, ignore_index=True)
return df
if __name__ == '__main__':
df = main()
我通过传递文件路径来测试reading
并且它可以工作。我真的很沮丧,我非常感谢你的帮助。也。如果您想要任何其他细节可以让您更好地帮助我,请告诉我。再次感谢。