我在.txt(以空格分隔)中有许多(> 40)数据文件,这些文件具有与我想读入python进行数据处理和绘图相同的布局。这些文件是一个参数的参数扫描的模型输出,该参数在每个数据文件中占据一列。该参数将递增到每个连续文件中的下一个值。
我遇到的问题是我不知道如何编写for循环来将每个数据文件读入其自己的数据帧中。
我已经看到很多建议“ pandas.read_csv”后接连接的答案,但是我不想将文件连接成一个数据框,因为我想分别绘制每个数据集。 对于我来说,仅串联数据框然后再将数据集分离出来就没有意义。
import glob
import os
import pandas as pd
from pandas import Series, DataFrame
path = r'D:/user/data-folder/'
files = glob.glob(os.path.join(path + 'data-*.txt')) # Added based on suggestions from similar questions
df1 = []
for f in files:
df = pd.read_csv(path1 + f,
sep=' '
)
df1.append(df)
print(df1)
理想情况下,我想将每个数据文件读入其自己的数据帧,并以递增方式编号,例如'df1_1','df1_2'等 然后,我可以分别操纵每个数据框,并将数据相互绘制以进行比较。
答案 0 :(得分:1)
数据框列表呢?如果您有:
../ data / a.txt:
firstname,lastname,hobby
niles,crane,wine tasting
martin,crane,sitting in recliner
bob,bulldog,being annoying
../ data / b.txt:
firstname,lastname,hobby
john,doe,doing stuff
jane,doe,being anonymous
humphrey,bogart,smoking and drinking
代码:
def main():
from glob import glob
from os.path import join
import pandas as pd
from pandas import DataFrame
from contextlib import ExitStack
local_path = "data/"
filenames = glob(join(local_path + "*.txt"))
with ExitStack() as context_manager:
files = [context_manager.enter_context(open(filename, "r")) for filename in filenames]
dataframes = []
for file in files:
dataframe = pd.read_csv(file)
dataframes.append(dataframe)
print(dataframes[0], end="\n\n")
print(dataframes[1])
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
输出:
firstname lastname hobby
0 niles crane wine tasting
1 martin crane sitting in recliner
2 bob bulldog being annoying
firstname lastname hobby
0 john doe doing stuff
1 jane doe being anonymous
2 humphrey bogart smoking and drinking
答案 1 :(得分:1)
pathlib
替换os
和glob
from pathlib import Path
data_path = Path(r'D:/user/data-folder')
data_files = data_path.glob('data-*.txt')
dict
df_dict = dict()
for i, file in enumerate(data_files):
df_dict[f'df_{i}'] = pd.read_csv(file, sep=' ')
DataFrame
df_dict['df_1']
DataFrames
for value in df_dict.values():
value.plot()