Pandas - 通过使用一列获取数据来扩展DataFrame

时间:2016-09-15 15:33:27

标签: python pandas dataframe

我从DataFrame配置开始

import pandas as pd

getData = lambda n: pd.util.testing.makeTimeDataFrame(n)

origDF = pd.DataFrame([{'weight':70, 'name':'GOLD', 'n':3}, {'weight':30, 'name':'SILVER', 'n':4}])

   n    name  weight
0  3    GOLD      70
1  4  SILVER      30

现在,我想通过使用n列获取数据,将此配置DataFrame扩展为完整数据DataFrame。我想要的结果是

res = []
for row in origDF.iterrows():
    tmp = getData(row[1]['n'])
    for c,v in row[1].iteritems():
        if c != 'n':
            tmp[c] = v

    res.append(tmp)

res = pd.concat(res)

                   A         B         C         D    name  weight
2000-01-03 -0.084821 -0.345260 -0.789547  0.001570    GOLD      70
2000-01-04 -0.035577 -1.283943 -0.304142 -0.978453    GOLD      70
2000-01-05  0.014727  0.400858 -0.607918  1.769886    GOLD      70
2000-01-03 -0.644647  2.142646  0.617880 -0.178515  SILVER      30
2000-01-04  0.256490 -1.037556 -0.224503  0.148258  SILVER      30
2000-01-05  0.679844  0.976823 -0.403927 -0.459163  SILVER      30
2000-01-06  0.433366  0.429025  0.951633 -0.026547  SILVER      30    

是否有一个很好的Pandas例程可以在没有循环的情况下直接获得它?

1 个答案:

答案 0 :(得分:0)

这是一个解决方案,它通过origDF

循环播放
In [167]: res = getData(origDF.n.sum())

In [168]: res['name'] = 'N/A'

In [169]: res['weight'] = 0

In [170]: res
Out[170]:
                   A         B         C         D name  weight
2000-01-03  1.097798 -1.537407  0.692180 -0.359577  N/A       0
2000-01-04  1.762158  0.568963  0.420136  0.265061  N/A       0
2000-01-05 -0.241067 -0.471753  0.370949  0.533276  N/A       0
2000-01-06  0.099100 -1.757071 -0.680193  0.261295  N/A       0
2000-01-07 -0.818920  0.201746  1.251089  0.834474  N/A       0
2000-01-10  1.551190 -0.329135  0.323669 -0.365978  N/A       0
2000-01-11 -1.941802  0.496720  0.969223 -0.413078  N/A       0

In [171]: i = 0

In [172]: for idx, row in origDF.iterrows():
   .....:         res.ix[i : i + row.n, 'name'] = row['name']
   .....:         res.ix[i : i + row.n, 'weight'] = row.weight
   .....:         i += row.n
   .....:

In [173]: res
Out[173]:
                   A         B         C         D    name  weight
2000-01-03  1.097798 -1.537407  0.692180 -0.359577    GOLD      70
2000-01-04  1.762158  0.568963  0.420136  0.265061    GOLD      70
2000-01-05 -0.241067 -0.471753  0.370949  0.533276    GOLD      70
2000-01-06  0.099100 -1.757071 -0.680193  0.261295  SILVER      30
2000-01-07 -0.818920  0.201746  1.251089  0.834474  SILVER      30
2000-01-10  1.551190 -0.329135  0.323669 -0.365978  SILVER      30
2000-01-11 -1.941802  0.496720  0.969223 -0.413078  SILVER      30