Question

我的数据格式如下：

       0        1         2         3        4         5         6    
0  09.08.00  31.6875 -0.017442  17.10.00  59.1250  0.002119  24.10.00 ...  
1  10.08.00  31.7031  0.000492  18.10.00  59.1250  0.000000  25.10.00 ...
2  11.08.00  31.7656  0.001971  19.10.00  59.3125  0.003171  26.10.00 ...  
3  14.08.00  31.5625 -0.006394  20.10.00  59.5625  0.004215  27.10.00 ...  
4  15.08.00  31.5000 -0.001980  23.10.00  59.1250 -0.007345  30.10.00 ...  

       413       414     415       416  
0   0.004704  01.05.18  133.48 -0.034991  ......  
1  -0.001725  02.05.18  138.58  0.038208  ...... 
2  0.000247  03.05.18  141.56  0.021504   ......
3  0.000987  04.05.18  139.76 -0.012715   ......
4  0.000493  07.05.18  139.63 -0.000930   .......

如您所见，数据分为多个组的3列（在本例中为代码）。更重要的是，我拥有的数据是在不同的时间内记录的。例如，第一组可能有100天的数据，第二组只有25天，依此类推。这意味着我们有不同数量的行，每个组都有数据条目。

我希望最后一个数据框具有3列而不是416。所以我正在考虑以某种方式一次将3列追加到现有数据框。因此最终结果应如下所示：

       0        1         2              
0  09.08.00  31.6875 -0.017442  
1  10.08.00  31.7031  0.000492  
2  11.08.00  31.7656  0.001971  
3  14.08.00  31.5625 -0.006394    
4  15.08.00  31.5000 -0.001980
5  17.10.00  59.1250  0.002119  
6  18.10.00  59.1250  0.000000  
7  19.10.00  59.3125  0.003171  
8  20.10.00  59.5625  0.004215  
9  23.10.00  59.1250 -0.007345

我希望这个问题足够清楚。我将如何在Python中使用熊猫等进行编程？预先感谢您的回答

此致

Elias

Answer 1

将stack与通过模和整数除法创建的多索引配合使用：

a = np.arange(len(df.columns))
df.columns = [a % 3, a // 3]
df = df.stack().sort_index(level=1).reset_index(drop=True)
print (df)
          0        1         2
0  09.08.00  31.6875 -0.017442
1  10.08.00  31.7031  0.000492
2  11.08.00  31.7656  0.001971
3  14.08.00  31.5625 -0.006394
4  15.08.00  31.5000 -0.001980
5  17.10.00  59.1250  0.002119
6  18.10.00  59.1250  0.000000
7  19.10.00  59.3125  0.003171
8  20.10.00  59.5625  0.004215
9  23.10.00  59.1250 -0.007345

可能会出现numpy解决方案，但由于最后需要转换一些字符串，因此必须转换为float s：

a = np.reshape(df.values,(len(df), -1, 3)).swapaxes(0,1).reshape(-1, 3)
df = pd.DataFrame(a)
df[[1,2]] = df[[1,2]].astype(float)
print (df)
          0        1         2
0  09.08.00  31.6875 -0.017442
1  10.08.00  31.7031  0.000492
2  11.08.00  31.7656  0.001971
3  14.08.00  31.5625 -0.006394
4  15.08.00  31.5000 -0.001980
5  17.10.00  59.1250  0.002119
6  18.10.00  59.1250  0.000000
7  19.10.00  59.3125  0.003171
8  20.10.00  59.5625  0.004215
9  23.10.00  59.1250 -0.007345

Answer 2

具有pd.concat + np.split函数（不改变初始数据帧df）的短替代方法：

f = lambda df: df.T.reset_index(drop=True).T
new_df = pd.concat(map(f, np.split(df, range(3, df.columns.size, 3), axis=1)), ignore_index=True)

在数据框的末尾添加3个重复列

2 个答案: