我的目标是将数据透视功能应用于包含重复记录的数据框。我通过在数据框中添加唯一列来解决它:
my_df['id_column'] = range(1, len(my_df.index)+1)
df_pivot = my_df.pivot(index ='id_column', columns = 'type', values = 'age_16_18').fillna(0).astype(int)
我想弄清楚如何将pivot
应用于数据框而不删除重复项或使用数据透视表?通过将拳头按多列分组,然后将结果传递到数据透视功能。我不确定分组后如何传递结果。
year category state_name type is_state gender age_16_18 age_18_30
0 2001 Foreigners CA Convicts 0 M 8 5
1 2001 Indians NY Convicts 0 F 5 2
2 2005 Foreigners NY Others 1 M 0 9
3 2009 Indians NJ Detenus 0 F 7 0
答案 0 :(得分:1)
尚不清楚您要尝试什么,但是看看是否可以从以下方法中获得启发。您希望将哪些列分组?
import pandas
my_df = pandas.DataFrame( { 'year' : [2001, 2001, 2005, 2009] ,
'category' : ['Foreigners','Indians','Foreigners','Indians'] ,
'state_name': ['CA','NY','NY','NJ' ],
'type': ['Convicts', 'Convicts','Others','Detenus'],
'is_state' : [0,0,1,0] ,
'gender' : ['M','F','M','F'],
'age_16_18':[8,5,0,7],
'age_18_30' : [5,2,9,0] }, columns=[ 'year','category','state_name','type','is_state','gender','age_16_18','age_18_30'])
>>> my_df.pivot( columns = 'type', values = 'age_16_18' )
type Convicts Detenus Others
0 8.0 NaN NaN
1 5.0 NaN NaN
2 NaN NaN 0.0
3 NaN 7.0 NaN
>>> my_df['key'] = my_df.category.str.cat(my_df.gender)
>>> my_df.pivot( index='key', columns = 'type', values = 'age_16_18' )
type Convicts Detenus Others
key
ForeignersM 8.0 NaN 0.0
IndiansF 5.0 7.0 NaN