从列表中展开数据帧行

时间:2017-12-13 08:40:05

标签: python pandas

想要扩展数据帧行以重复列表列:

import pandas as pd, numpy as np
s = ['01NOV2017', '02NOV2017']
df = pd.DataFrame(np.random.randn(6,4), columns=list('ABCD'), index=range(6))

这样......

    A   B   C   D
0   -1.451528   -1.665262   1.425986    -0.032988
1   1.376609    -0.337819   -0.513632   -0.595584
2   0.520186    -0.019358   -0.403923   0.713807
3   0.553661    0.682552    1.312556    0.966446
4   1.269042    2.034769    0.574845    0.846175
5   0.007470    1.434704    0.173193    0.895777

...变为:

Date            A           B           C           D
01Nov2017   0   -1.451528   -1.665262   1.425986    -0.032988
01Nov2017   1   1.376609    -0.337819   -0.513632   -0.595584
01Nov2017   2   0.520186    -0.019358   -0.403923   0.713807
01Nov2017   3   0.553661    0.682552    1.312556    0.966446
01Nov2017   4   1.269042    2.034769    0.574845    0.846175
01Nov2017   5   0.007470    1.434704    0.173193    0.895777
02Nov2017   0   -1.451528   -1.665262   1.425986    -0.032988
02Nov2017   1   1.376609    -0.337819   -0.513632   -0.595584
...

这怎么可能?

2 个答案:

答案 0 :(得分:3)

使用concat

df = pd.concat([df] * len(s), keys=s)
print (df)
                    A         B         C         D
01NOV2017 0  1.130177 -0.888353  0.316773 -0.434137
          1  1.629171  1.947267 -0.415701 -0.620040
          2 -0.629012  1.357567 -1.966725  0.480601
          3 -2.154263 -1.185177  0.261690  0.188716
          4  2.117664  0.416418  0.339006 -0.643895
          5  1.933276  0.282515  0.859852 -0.448571
02NOV2017 0  1.130177 -0.888353  0.316773 -0.434137
          1  1.629171  1.947267 -0.415701 -0.620040
          2 -0.629012  1.357567 -1.966725  0.480601
          3 -2.154263 -1.185177  0.261690  0.188716
          4  2.117664  0.416418  0.339006 -0.643895
          5  1.933276  0.282515  0.859852 -0.448571

编辑:

df1 = pd.concat([df] * len(s), ignore_index=True)
df1.insert(0, 'Date', np.repeat(s, len(df)))
print (df1)

         Date         A         B         C         D
0   01NOV2017 -0.489019  1.076954 -0.616073  1.271138
1   01NOV2017  0.758143  0.009106 -1.115460 -0.355548
2   01NOV2017 -0.025088 -0.147855 -0.303579  2.120897
3   01NOV2017 -0.898241 -0.231282  1.100928 -1.519086
4   01NOV2017  0.078057 -0.145468 -0.092385 -0.824499
5   01NOV2017  0.512102 -2.443919 -0.932585  0.088907
6   02NOV2017 -0.489019  1.076954 -0.616073  1.271138
7   02NOV2017  0.758143  0.009106 -1.115460 -0.355548
8   02NOV2017 -0.025088 -0.147855 -0.303579  2.120897
9   02NOV2017 -0.898241 -0.231282  1.100928 -1.519086
10  02NOV2017  0.078057 -0.145468 -0.092385 -0.824499
11  02NOV2017  0.512102 -2.443919 -0.932585  0.088907

答案 1 :(得分:0)

有兴趣,这是使用交叉连接的替代方法:

# define temp key for cross join
dates=pd.DataFrame({'Date': s, 'tmp_key': [1,1]})
df['tmp_key']=1

# get index as column
df.reset_index(inplace=True)

#merge
df=df.merge(dates, how='outer', on='tmp_key')
df.drop(labels='tmp_key', axis=1, inplace=True)
df.set_index(keys='Date', drop=True, inplace=True)