使用合并单元重塑数据框熊猫

时间:2018-09-10 10:00:15

标签: python pandas

我有

df = pd.DataFrame({
'key': ['value1','value2','value1','value2'],
'domain': ['domain1.com','domain1.com','domain2.com','domain2.com'],
'url' :['urlB','urlA','url1','url2'],
'score' : [12,14,200,2001]})

我想得到结果 result

我已经尝试过转置,堆栈...但是无法得到相同的结果。

我是Python Pandas的新手, 请指教

[编辑]

感谢@jezrael的回复,它可以通过

使用
df = df.set_index(['key','domain']).unstack().swaplevel(0,1, axis=1).sort_index(axis=1)

移至下一级进行排序, 我从头开始添加更多行:

df = pd.DataFrame({
    'key': ['value1','value2','value1','value2','value2','value3'],
    'domain': ['domain1.com','domain1.com','domain2.com','domain2.com','domain3.com','domain4.com'],
    'url' :['urlB','urlA','url1','url2','url3','url4'],
    'score' : [12,14,200,2001,10,5]
})

dfdomains = pd.DataFrame({
    'domain': ['domain1.com','domain2.com', 'domain3.com','domain4.com'],
    'order' : [3,1,2,4]
})

我通过您的答案得到了数据帧:

df1 = df.set_index(['key','domain']).unstack().swaplevel(0,1, axis=1).sort_index(axis=1, ascending=False)

那给了我结果:

domain  domain4.com domain3.com domain2.com domain1.com
url score   url score   url score   url score
key                             
value1  NaN NaN NaN NaN url1    200.0   urlB    12.0
value2  NaN NaN url3    10.0    url2    2001.0  urlA    14.0
value3  url4    5.0 NaN NaN NaN NaN NaN NaN

我想用sort df1order of dfdomains:这意味着df1的第一列是domain2.com (order= 1)

期望:image

您能给我提个建议吗? 谢谢

1 个答案:

答案 0 :(得分:3)

使用:

df = df.set_index(['key','domain']).unstack().swaplevel(0,1, axis=1).sort_index(axis=1)
print (df)

domain domain1.com       domain2.com      
             score   url       score   url
key                                       
value1          12  urlB         200  url1
value2          14  urlA        2001  url2
  1. 第一set_index代表MultiIndex
  2. 通过unstack进行整形以进行整形,
  3. 然后在MultiIndex列中的swaplevel
  4. 最后按sort_index排序

编辑:首先sort_values用于按列order进行排序,然后添加DataFrame.reindex-必须将order的所有值都放在df['domain']

order = dfdomains.sort_values('order')['domain']
print (order)
1    domain2.com
2    domain3.com
0    domain1.com
3    domain4.com
Name: domain, dtype: object

df1 = (df.set_index(['key','domain'])
         .unstack()
         .swaplevel(0,1, axis=1)
         .sort_index(axis=1, ascending=False)
         .reindex(order, axis=1, level=0))
print (df1)
domain domain2.com         domain3.com       domain1.com       domain4.com  \
               url   score         url score         url score         url   
key                                                                          
value1        url1   200.0         NaN   NaN        urlB  12.0         NaN   
value2        url2  2001.0        url3  10.0        urlA  14.0         NaN   
value3         NaN     NaN         NaN   NaN         NaN   NaN        url4   

domain        
       score  
key           
value1   NaN  
value2   NaN  
value3   5.0