Question

我的数据如下

user  region  attribute   reading
Jon   Europe  fathername  peter
Jon   Europe  age         50
Jon   Europe  mothername  mary
Jon   Europe  age         44
Jon   Europe  brothername duke
Jon   Europe  age         25

这是将其存储在sql数据库中的方式。我正在将其读入数据框，并尝试生成如下数据

attribute             fathername age mothername age brothername age     
User      region
Don       Europe      peter      50   mary      44  duke         25

但是，我无法获得这个

年龄不会重复，只会出现一次并采用其中任何一个值

这是我尝试过的-

pd.pivot_table(df_mysql , index=['User'],columns=['attribute'],values=['reading'], aggfunc=lambda x: x,dropna = 'False')

必须出现重复的属性（列）。请问我有什么想法

Answer 1

首先在熊猫中最好避免使用重复的列名，因此可能的解决方案是使用pivot对重复的值进行重复数据删除：

print (df)
    user  region    attribute reading
0    Jon  Europe   fathername   peter
1    Jon  Europe          age      50
2    Jon  Europe   mothername    mary
3    Jon  Europe          age      44
4    Jon  Europe  brothername    duke
5    Jon  Europe          age      25
6   Jon1  Europe   fathername   peter
7   Jon1  Europe          age      50
8   Jon1  Europe   mothername    mary
9   Jon1  Europe          age      44
10  Jon1  Europe  brothername    duke
11  Jon1  Europe          age      25

m = df.duplicated(['user','region', 'attribute'], keep=False)
df.loc[m, 'attribute'] += df[m].groupby(['user','region', 'attribute']).cumcount().astype(str)

df = df.pivot_table(index=['user','region'],
                    columns='attribute',
                    values='reading',
                    aggfunc='sum').reindex(df['attribute'].unique(), axis=1)
print (df)
attribute   fathername age0 mothername age1 brothername age2
user region                                                 
Jon  Europe      peter   50       mary   44        duke   25
Jon1 Europe      peter   50       mary   44        duke   25

相同列名称的数据透视表-数据透视后必须重复

1 个答案: