根据唯一列值创建列并填充

时间:2018-05-16 19:40:23

标签: python pandas

我有以下数据框:

    Timestamp   id  lat         long
0   665047      a   30.508420   -84.372882
1   665047      b   30.491882   -84.372938
2   2058714     b   30.492026   -84.372938
3   665348      a   30.508420   -84.372882
4   2055292     b   30.491899   -84.372938

我希望的结果是:

    Timestamp                        a                       b     
0   665047     [30.508420,  -84.372882] [30.491882, -84.372938]
1   665348     [30.508420,  -84.372882]                    NaN
2   2055292                        NaN  [30.491899, -84.372938]
3   2058714                        NaN  [30.492026, -84.372938]

df.id中找到的唯一值变为列标题(可能有数千个),其纬度和经度为值。

我最接近的是使用:

for i, r in df.iterrows():
    dct[r.Timestamp].append([r.id, r.lat, r.long])

pd.DataFrame.from_dict(dct, orient='index')


                                0                                   1
2055292 [b, 30.491899, -84.372938]                               None
2058714 [b, 30.492026, -84.372938]                               None
665348  [a, 30.50842, -84.37288199999999]                        None
665047  [a, 30.50842, -84.37288199999999]   [b, 30.491882, -84.372938]

但我知道在熊猫中使用任何类型的迭代都是不好的(并且它与我期望的结果无关),我确信有更简单的方法。

2 个答案:

答案 0 :(得分:3)

我认为这与unstack

有关
(df.groupby(['Timestamp', 'id'])
 .apply(lambda x: x[['lat', 'long']].values.flatten())
 .unstack(level='id'))

id                              a                        b
Timestamp                                                 
665047     [30.50842, -84.372882]  [30.491882, -84.372938]
665348     [30.50842, -84.372882]                     None
2055292                      None  [30.491899, -84.372938]
2058714                      None  [30.492026, -84.372938]

答案 1 :(得分:2)

选项1

然后设置索引pipe

df.set_index(['Timestamp', 'id']).pipe(
    lambda d: pd.Series(d.values.tolist(), d.index).unstack()
)

id                                      a                        b
Timestamp                                                         
665047     [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348     [30.50842, -84.37288199999999]                     None
2055292                              None  [30.491899, -84.372938]
2058714                              None  [30.492026, -84.372938]

选项2

cols = ['Timestamp', 'id', 'lat', 'long']
pd.Series({
    t[:2]: list(t[2:])
    for t in df[cols].itertuples(index=False)
}).unstack()

                                      a                        b
665047   [30.50842, -84.37288199999999]  [30.491882, -84.372938]
665348   [30.50842, -84.37288199999999]                     None
2055292                            None  [30.491899, -84.372938]
2058714                            None  [30.492026, -84.372938]