我有一个名为weather的DataFrame,其结构如下:
STATION DATE ELEM VALUE
0 US1MNCV0008 20170101 PRCP 0
1 US1MNCV0008 20170101 SNOW 0
2 US1MISW0005 20170101 PRCP 0
3 US1MISW0005 20170101 SNOW 0
4 US1MISW0005 20170101 SNWD 0
我想使用日期和桩号合并行以获得以下内容:
STATION DATE ELEM VALUE ELEM VALUE ELEM VALUE
0 US1MNCV0008 20170101 PRCP 0 SNOW 0
1 US1MISW0005 20170101 PRCP 0 SNOW 0 SNWD 0
我正在尝试通过使用以下方法来实现这一目标:
weather.groupby(['station', as_index=False).agg(lambda x: x.tolist())
但这会创建列表,这不是我想要的。我该如何进行汇总?
答案 0 :(得分:2)
您可以使用:
df = (df.set_index(['STATION','DATE', df.groupby(['STATION','DATE']).cumcount()])
.unstack()
.sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
STATION DATE ELEM_0 VALUE_0 ELEM_1 VALUE_1 ELEM_2 VALUE_2
0 US1MISW0005 20170101 PRCP 0.0 SNOW 0.0 SNWD 0.0
1 US1MNCV0008 20170101 PRCP 0.0 SNOW 0.0 NaN NaN
说明:
STATION
和DATE
和cumcount
获取每组的计数set_index
创建MultiIndex
unstack
重塑MultiIndex
index
转换为reset_index
的列或使用GroupBy.apply
为每个组创建DaatFrame
,最后一种解决方法与上述相同:
df = (df.groupby(['STATION','DATE'])['ELEM','VALUE']
.apply(lambda x: pd.DataFrame(x.values, columns=x.columns))
.unstack()
.sort_index(axis=1, level=1))
df.columns = ['{}_{}'.format(i, j) for i, j in df.columns]
df = df.reset_index()
print (df)
STATION DATE ELEM_0 VALUE_0 ELEM_1 VALUE_1 ELEM_2 VALUE_2
0 US1MISW0005 20170101 PRCP 0 SNOW 0 SNWD 0
1 US1MNCV0008 20170101 PRCP 0 SNOW 0 NaN NaN