如何在循环中但没有串联的情况下导入许多.txt文件?

时间:2019-08-03 15:54:06

标签: python pandas dataframe

我在.txt(以空格分隔)中有许多(> 40)数据文件,这些文件具有与我想读入python进行数据处理和绘图相同的布局。这些文件是一个参数的参数扫描的模型输出,该参数在每个数据文件中占据一列。该参数将递增到每个连续文件中的下一个值。

我遇到的问题是我不知道如何编写for循环来将每个数据文件读入其自己的数据帧中。

我已经看到很多建议“ pandas.read_csv”后接连接的答案,但是我不想将文件连接成一个数据框,因为我想分别绘制每个数据集。 对于我来说,仅串联数据框然后再将数据集分离出来就没有意义。

import glob
import os
import pandas as pd
from pandas import Series, DataFrame

path = r'D:/user/data-folder/'

files = glob.glob(os.path.join(path + 'data-*.txt')) # Added based on suggestions from similar questions
df1 = []
for f in files:
    df = pd.read_csv(path1 + f,
         sep=' '
         )
    df1.append(df)

print(df1)

理想情况下,我想将每个数据文件读入其自己的数据帧,并以递增方式编号,例如'df1_1','df1_2'等 然后,我可以分别操纵每个数据框,并将数据相互绘制以进行比较。

2 个答案:

答案 0 :(得分:1)

数据框列表呢?如果您有:

../ data / a.txt:

firstname,lastname,hobby
niles,crane,wine tasting
martin,crane,sitting in recliner
bob,bulldog,being annoying

../ data / b.txt:

firstname,lastname,hobby
john,doe,doing stuff
jane,doe,being anonymous
humphrey,bogart,smoking and drinking

代码:

def main():

    from glob import glob
    from os.path import join
    import pandas as pd
    from pandas import DataFrame
    from contextlib import ExitStack

    local_path = "data/"

    filenames = glob(join(local_path + "*.txt"))

    with ExitStack() as context_manager:
        files = [context_manager.enter_context(open(filename, "r")) for filename in filenames]

        dataframes = []
        for file in files:
            dataframe = pd.read_csv(file)
            dataframes.append(dataframe)

        print(dataframes[0], end="\n\n")
        print(dataframes[1])

    return 0


if __name__ == "__main__":
    import sys
    sys.exit(main())

输出:

  firstname lastname                hobby
0     niles    crane         wine tasting
1    martin    crane  sitting in recliner
2       bob  bulldog       being annoying

  firstname lastname                 hobby
0      john      doe           doing stuff
1      jane      doe       being anonymous
2  humphrey   bogart  smoking and drinking

答案 1 :(得分:1)

使用pathlib替换osglob

from pathlib import Path

获取文件

data_path = Path(r'D:/user/data-folder')
data_files = data_path.glob('data-*.txt')

将它们存储在dict

df_dict = dict()
for i, file in enumerate(data_files):
    df_dict[f'df_{i}'] = pd.read_csv(file, sep=' ')

召回DataFrame

df_dict['df_1']

情节DataFrames

for value in df_dict.values():
    value.plot()