我的数据如下
user region attribute reading
Jon Europe fathername peter
Jon Europe age 50
Jon Europe mothername mary
Jon Europe age 44
Jon Europe brothername duke
Jon Europe age 25
这是将其存储在sql数据库中的方式。我正在将其读入数据框,并尝试生成如下数据
attribute fathername age mothername age brothername age
User region
Don Europe peter 50 mary 44 duke 25
但是,我无法获得这个
年龄不会重复,只会出现一次并采用其中任何一个值
这是我尝试过的-
pd.pivot_table(df_mysql , index=['User'],columns=['attribute'],values=['reading'], aggfunc=lambda x: x,dropna = 'False')
必须出现重复的属性(列)。请问我有什么想法
答案 0 :(得分:1)
首先在熊猫中最好避免使用重复的列名,因此可能的解决方案是使用pivot
对重复的值进行重复数据删除:
print (df)
user region attribute reading
0 Jon Europe fathername peter
1 Jon Europe age 50
2 Jon Europe mothername mary
3 Jon Europe age 44
4 Jon Europe brothername duke
5 Jon Europe age 25
6 Jon1 Europe fathername peter
7 Jon1 Europe age 50
8 Jon1 Europe mothername mary
9 Jon1 Europe age 44
10 Jon1 Europe brothername duke
11 Jon1 Europe age 25
m = df.duplicated(['user','region', 'attribute'], keep=False)
df.loc[m, 'attribute'] += df[m].groupby(['user','region', 'attribute']).cumcount().astype(str)
df = df.pivot_table(index=['user','region'],
columns='attribute',
values='reading',
aggfunc='sum').reindex(df['attribute'].unique(), axis=1)
print (df)
attribute fathername age0 mothername age1 brothername age2
user region
Jon Europe peter 50 mary 44 duke 25
Jon1 Europe peter 50 mary 44 duke 25