我编写了这个代码来加载文件夹中的所有文件并逐个处理它们并创建一个包含所有文件的df。这是使用append使它成为一个改变结构的numpy列表。
np_array_list = []
for file in folder:
df = pd.read_csv(file, header=None)
#
#here I do more work on the files I import
#
merged = pd.df
merged.to_csv('2017_'+str(Time)+'_min_'+os.path.basename(file)+'_merged.csv')
np_array_list.append(merged.as_matrix())
print(merged.head(5))
comb_np_array = np.vstack(np_array_list)
#print(comb_np_array)
big_frame = pd.DataFrame(comb_np_array)
big_frame.to_csv('test.csv')
我的问题是big_frame是这样的:
[5 rows x 47 columns]
[[2.00000000e+00 0.00000000e+00 1.25698594e+04 ... 1.64000000e+02
1.25715000e+04 3.00000000e+01]
[2.00000000e+00 1.00000000e+00 1.25775858e+04 ... 2.25000000e+02
1.25795000e+04 4.40000000e+01]
[2.00000000e+00 2.00000000e+00 1.25800000e+04 ... 2.38000000e+02
1.25805000e+04 1.80000000e+01]
虽然预期应该如下所示:
hour minute k1_UNfiltered k2_UNfiltered k3_UNfiltered k4_UNfiltered k5_UNfiltered
max
1min
2017-09-19 02:00:00 2 0 12561.604167 12565.5 12559 12565 12556
2017-09-19 02:01:00 2 1 12560.077922 12562.5 12562 12562.5 12557
2017-09-19 02:02:00 2 2 12558.45 12559.5 12557 12559.5 12557
2017-09-19 02:03:00 2 3 12556.253623 12560 12559.5 12560 12553
2017-09-19 02:04:00 2 4 12555.944444 12557 12556.5 12556.5 12555
请告知如何解决这个问题 谢谢!
答案 0 :(得分:1)
我认为需要将每个DataFrame
附加到列表L
,然后使用concat
:
L = []
for file in folder:
df = pd.read_csv(file, header=None)
#
#here I do more work on the files I import
#
merged = pd.df
merged.to_csv('2017_'+str(Time)+'_min_'+os.path.basename(file)+'_merged.csv')
L.append(merged)
print(merged.head(5))
big_frame = pd.concat(L)