在循环中连接df而不在python中追加

时间:2018-03-19 12:10:46

标签: python pandas numpy

我编写了这个代码来加载文件夹中的所有文件并逐个处理它们并创建一个包含所有文件的df。这是使用append使它成为一个改变结构的numpy列表。

np_array_list = []
for file in folder:
    df = pd.read_csv(file, header=None)
    #
    #here I do more work on the files I import
    #
    merged = pd.df
    merged.to_csv('2017_'+str(Time)+'_min_'+os.path.basename(file)+'_merged.csv')
    np_array_list.append(merged.as_matrix())
    print(merged.head(5))

comb_np_array = np.vstack(np_array_list)
#print(comb_np_array)
big_frame = pd.DataFrame(comb_np_array)
big_frame.to_csv('test.csv')

我的问题是big_frame是这样的:

[5 rows x 47 columns]
[[2.00000000e+00 0.00000000e+00 1.25698594e+04 ... 1.64000000e+02
  1.25715000e+04 3.00000000e+01]
 [2.00000000e+00 1.00000000e+00 1.25775858e+04 ... 2.25000000e+02
  1.25795000e+04 4.40000000e+01]
 [2.00000000e+00 2.00000000e+00 1.25800000e+04 ... 2.38000000e+02
  1.25805000e+04 1.80000000e+01]

虽然预期应该如下所示:

        hour    minute  k1_UNfiltered   k2_UNfiltered   k3_UNfiltered   k4_UNfiltered   k5_UNfiltered
            max                 
1min                                
2017-09-19  02:00:00    2   0   12561.604167    12565.5 12559   12565   12556
2017-09-19  02:01:00    2   1   12560.077922    12562.5 12562   12562.5 12557
2017-09-19  02:02:00    2   2   12558.45    12559.5 12557   12559.5 12557
2017-09-19  02:03:00    2   3   12556.253623    12560   12559.5 12560   12553
2017-09-19  02:04:00    2   4   12555.944444    12557   12556.5 12556.5 12555

请告知如何解决这个问题 谢谢!

1 个答案:

答案 0 :(得分:1)

我认为需要将每个DataFrame附加到列表L,然后使用concat

L = []
for file in folder:
    df = pd.read_csv(file, header=None)
    #
    #here I do more work on the files I import
    #
    merged = pd.df
    merged.to_csv('2017_'+str(Time)+'_min_'+os.path.basename(file)+'_merged.csv')
    L.append(merged)
    print(merged.head(5))

big_frame = pd.concat(L)