如果它们多次针对特定ID存在,我想将row的某些值转换为column
我有一个df,其中有一些列,例如ID和Phone Number。如果要针对ID存在多个电话号码,我想在列中添加电话号码的值
我有这个
ID Phone Number
1 234444
1 989898
2 30909
我想这样做
ID Phone Number Phone Number 2
1 234444 989898
2 30909
答案 0 :(得分:1)
您要旋转数据框。这是使用pivot_table
的一种方法:
g = df.groupby('ID').cumcount().add(1)
df.pivot_table(index='ID', columns=g).droplevel(0, axis=1).add_prefix('Phone Number ')
Phone Number 1 Phone Number 2
ID
1 234444.0 989898.0
2 30909.0 NaN
对于低于 0.24.0 的熊猫版本:
g = df.groupby('ID').cumcount().add(1)
df_ = df.pivot_table(index = 'ID', columns=g)
df_.columns = df_.columns.droplevel(0)
df_.add_prefix('Phone Number ')
Phone Number 1 Phone Number 2
ID
1 234444.0 989898.0
2 30909.0 NaN
答案 1 :(得分:0)
将熊猫作为pd导入
df = pd.DataFrame([['1','2345'],['1','7890'],['2','1580']], columns = ['ID','Phone Number'])
d2 = df.groupby('ID')
new_df = pd.DataFrame()
for i in range(len(d2)):
new_df = pd.concat([new_df, d2.nth(i).add_suffix(i+1)], axis=1)
new_df = new_df.rename_axis('ID').reset_index()
输出:
print(new_df)
ID Phone Number1 Phone Number2
0 1 2345 7890
1 2 1580 NaN
答案 2 :(得分:0)
枢轴旋转一列Phone Number
的解决方案:
g = df.groupby('ID').cumcount().add(1)
df1 = df.set_index([g, 'ID'])['Phone Number'].unstack().add_prefix('Phone Number ')
print (df1)
ID Phone Number 1 Phone Number 2
1 234444.0 30909.0
2 989898.0 NaN
或者:
df['idx'] = df.groupby('ID').cumcount().add(1)
df1 = df.pivot('idx', 'ID', 'Phone Number').add_prefix('Phone Number ')
print (df1)
ID Phone Number 1 Phone Number 2
idx
1 234444.0 30909.0
2 989898.0 NaN
或者:
s = df.groupby('ID')['Phone Number'].apply(list)
df1 = pd.DataFrame(s.values.tolist(), index=s.index).add_prefix('Phone Number ')
print (df1)
Phone Number 0 Phone Number 1
ID
1 234444 989898.0
2 30909 NaN
如果以上解决方案最后需要索引到列:
df1 = df1.rename_axis(None, axis=1).rename_axis('ID').reset_index()
print (df1)
ID Phone Number 1 Phone Number 2
0 1 234444.0 30909.0
1 2 989898.0 NaN
用于多列的解决方案,必须以相同的方式进行处理:
print (df)
ID Phone Number Name Val
0 1 234444 A 10
1 1 989898 B 4
2 2 30909 C 6
g = df.groupby('ID').cumcount().add(1)
df = df.set_index([g, 'ID']).unstack()
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.rename_axis('ID').reset_index()
print (df)
ID Phone Number1 Phone Number2 Name1 Name2 Val1 Val2
0 1 234444.0 30909.0 A C 10.0 6.0
1 2 989898.0 NaN B NaN 4.0 NaN
或者:
df1 = df.groupby('ID').agg(list)
comb = [pd.DataFrame(df1[x].values.tolist(), index=df1.index) for x in df1.columns]
df = pd.concat(comb, axis=1, keys=df1.columns)
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.rename_axis('ID').reset_index()
print (df)
ID Phone Number0 Phone Number1 Name0 Name1 Val0 Val1
0 1 234444 989898.0 A B 10 4.0
1 2 30909 NaN C None 6 NaN