我想将行转换为列

时间:2019-03-28 13:16:05

标签: python pandas dataframe

如果它们多次针对特定ID存在,我想将row的某些值转换为column

我有一个df,其中有一些列,例如ID和Phone Number。如果要针对ID存在多个电话号码,我想在列中添加电话号码的值

我有这个

ID  Phone Number
1        234444
1        989898
2         30909

我想这样做

ID    Phone Number   Phone Number 2                            
1         234444        989898
2          30909             

3 个答案:

答案 0 :(得分:1)

您要旋转数据框。这是使用pivot_table的一种方法:

g = df.groupby('ID').cumcount().add(1)
df.pivot_table(index='ID', columns=g).droplevel(0, axis=1).add_prefix('Phone Number ')

      Phone Number 1  Phone Number 2
ID                                
1         234444.0        989898.0
2          30909.0             NaN

对于低于 0.24.0 的熊猫版本:

g = df.groupby('ID').cumcount().add(1)
df_ = df.pivot_table(index = 'ID', columns=g)
df_.columns = df_.columns.droplevel(0)
df_.add_prefix('Phone Number ')

    Phone Number 1  Phone Number 2
ID                                
1         234444.0        989898.0
2          30909.0             NaN

答案 1 :(得分:0)

将熊猫作为pd导入

df = pd.DataFrame([['1','2345'],['1','7890'],['2','1580']], columns = ['ID','Phone Number'])


d2 = df.groupby('ID')
new_df = pd.DataFrame()
for i in range(len(d2)):
    new_df = pd.concat([new_df, d2.nth(i).add_suffix(i+1)], axis=1) 

new_df = new_df.rename_axis('ID').reset_index()

输出:

print(new_df)
  ID Phone Number1 Phone Number2
0  1          2345          7890
1  2          1580           NaN

答案 2 :(得分:0)

枢轴旋转一列Phone Number的解决方案:

g = df.groupby('ID').cumcount().add(1)

df1 = df.set_index([g, 'ID'])['Phone Number'].unstack().add_prefix('Phone Number ')
print (df1)
ID  Phone Number 1  Phone Number 2
1         234444.0         30909.0
2         989898.0             NaN

或者:

df['idx'] = df.groupby('ID').cumcount().add(1)
df1 = df.pivot('idx', 'ID', 'Phone Number').add_prefix('Phone Number ')
print (df1)
ID   Phone Number 1  Phone Number 2
idx                                
1          234444.0         30909.0
2          989898.0             NaN

或者:

s = df.groupby('ID')['Phone Number'].apply(list)
df1 = pd.DataFrame(s.values.tolist(), index=s.index).add_prefix('Phone Number ')
print (df1)
    Phone Number 0  Phone Number 1
ID                                
1           234444        989898.0
2            30909             NaN

如果以上解决方案最后需要索引到列:

df1 = df1.rename_axis(None, axis=1).rename_axis('ID').reset_index()
print (df1)
   ID  Phone Number 1  Phone Number 2
0   1        234444.0         30909.0
1   2        989898.0             NaN

用于多列的解决方案,必须以相同的方式进行处理:

print (df)
   ID  Phone Number Name  Val
0   1        234444    A   10
1   1        989898    B    4
2   2         30909    C    6

g = df.groupby('ID').cumcount().add(1)

df = df.set_index([g, 'ID']).unstack()
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.rename_axis('ID').reset_index()
print (df)
   ID  Phone Number1  Phone Number2 Name1 Name2  Val1  Val2
0   1       234444.0        30909.0     A     C  10.0   6.0
1   2       989898.0            NaN     B   NaN   4.0   NaN

或者:

df1 = df.groupby('ID').agg(list)
comb = [pd.DataFrame(df1[x].values.tolist(), index=df1.index) for x in df1.columns]
df = pd.concat(comb, axis=1, keys=df1.columns)
df.columns = [f'{a}{b}' for a, b in df.columns]
df = df.rename_axis('ID').reset_index()
print (df)
   ID  Phone Number0  Phone Number1 Name0 Name1  Val0  Val1
0   1         234444       989898.0     A     B    10   4.0
1   2          30909            NaN     C  None     6   NaN