我的数据框目前如下:
ID FIELD VALUE
12463634 TEST 22.2
12463634 E_REASON 010
12463634 IN_SCOPE Y
12463635 TEST 99.5
12463635 E_REASON 020
12463635 IN_SCOPE N
我希望我的数据框看起来像:
ID TEST E_REASON IN_SCOPE
12463634 22.2 010 Y
12463635 99.5 020 N
我尝试过运行此代码:
df.pivot_table(index = "ID", columns = "FIELD", values = "VALUE")
但是,我看到了这个错误:
DataError: No numeric types to aggregate
请指教。谢谢!
答案 0 :(得分:2)
df = df.pivot(index = "ID", columns = "FIELD", values = "VALUE")
print (df)
FIELD E_REASON IN_SCOPE TEST
ID
12463634 010 Y 22.2
12463635 020 N 99.5
df = df.set_index(['ID', 'FIELD'])['VALUE'].unstack()
print (df)
FIELD E_REASON IN_SCOPE TEST
ID
12463634 010 Y 22.2
12463635 020 N 99.5
如果重复项需要pivot_table
一些汇总函数 - sum
或','join
:
print (df)
ID FIELD VALUE
0 12463634 TEST 22.2
1 12463634 E_REASON 010
2 12463634 IN_SCOPE Y<-same ID and FIELED
3 12463634 IN_SCOPE Y1<-same ID and FIELED
4 12463635 TEST 99.5
5 12463635 E_REASON 020
6 12463635 IN_SCOPE N
df = df.pivot_table(index = "ID", columns = "FIELD", values = "VALUE", aggfunc='sum')
print (df)
FIELD E_REASON IN_SCOPE TEST
ID
12463634 010 YY1 22.2
12463635 020 N 99.5
或者:
df = df.pivot_table(index = "ID", columns = "FIELD", values = "VALUE", aggfunc=','.join)
print (df)
FIELD E_REASON IN_SCOPE TEST
ID
12463634 010 Y,Y1 22.2
12463635 020 N 99.5
答案 1 :(得分:0)
另一种选择是做类似的事情:
df.groupby(['ID', 'FIELD']).sum().unstack()
这样您就不会丢失标签'VALUES'
你可以groupby
和sum()
给你
In [31]: df.groupby(['ID', 'FIELD']).sum()
Out[31]:
VALUE
ID FIELD
12463634 E_REASON 010
IN_SCOPE Y
TEST 22.2
12463635 E_REASON 020
IN_SCOPE N
TEST 99.5
然后unstack
将最后一个索引行移动到列