我正在尝试将包含字符串的表作为结果。
import pandas as pd
df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})
df1.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
但我明白了:DataError: No numeric types to aggregate
。
当我将结果值更改为数字时,这可以正常工作:
df2 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': [1,0,0,1,1,0,0,1]})
df2.pivot_table(values='result',rows='index',cols=['variable1','variable2','variable3'])
我得到了我需要的东西:
variable1 A B
variable2 a b a b
variable3 x y x y x y
index
0 1 NaN NaN NaN NaN NaN
1 NaN NaN 0 NaN NaN NaN
2 NaN NaN NaN NaN 0 NaN
3 NaN NaN NaN NaN NaN 1
4 NaN 1 NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN 0
6 NaN NaN NaN NaN 0 NaN
7 NaN NaN NaN 1 NaN NaN
我知道我可以将字符串映射到数值然后反转操作,但也许有更优雅的解决方案?
答案 0 :(得分:24)
我的原始回复基于Pandas 0.14.1,从那时起,pivot_table函数中的许多内容发生了变化(rows - > index,cols - > columns ......)
此外,我发布的原始lambda技巧似乎不再适用于Pandas 0.18。您必须提供减少功能(即使它是最小值,最大值或平均值)。但即使这样看起来也不合适 - 因为我们并没有减少数据集,只是改变它......所以我看起来更加难以置信......
import pandas as pd
df1 = pd.DataFrame({'index' : range(8),
'variable1' : ["A","A","B","B","A","B","B","A"],
'variable2' : ["a","b","a","b","a","b","a","b"],
'variable3' : ["x","x","x","y","y","y","x","y"],
'result': ["on","off","off","on","on","off","off","on"]})
# these are the columns to end up in the multi-index columns.
unstack_cols = ['variable1', 'variable2', 'variable3']
首先,使用索引+要堆叠的列设置数据的索引,然后使用级别arg调用unstack。
df1.set_index(['index'] + unstack_cols).unstack(level=unstack_cols)
结果数据框如下。
答案 1 :(得分:2)
我认为最好的妥协是用True / False替换on / off,这将使pandas能够更好地“理解”数据,并以一种智能的,预期的方式行事。
df2 = df1.replace({'on': True, 'off': False})
你基本上在你的问题中承认了这一点。我的回答是,我不认为有更好的方法,你应该为接下来的任何事情取代'on'/'off'。
正如Andy Hayden在评论中指出的那样,如果用1/0替换开/关,你的表现会更好。