Pandas:将一个标题下的多个列分组

时间:2017-04-20 11:14:39

标签: python pandas dataframe header

设置向上

我有一个由多列组成的pandas数据帧df,标题如

| id | x, single room | x, double room | y, single room | y, double room |
--------------------------------------------------------------------------
   ⋮          ⋮               ⋮                 ⋮                 ⋮

<小时/> 的问题

我想按照以下方式对以x开头并以标题下的y开头的列进行分组,

     |             x             |             y             |
--------------------------------------------------------------
| id | single room | double room | single room | double room |
--------------------------------------------------------------
   ⋮        ⋮             ⋮              ⋮             ⋮          

我该怎么办?

1 个答案:

答案 0 :(得分:3)

您可以使用split,但主要问题是让id达到最后一级:

col =['id','x, single room','x, double room','y, single room','y, double room' ]
df = pd.DataFrame([[1,1,1,1,1]], columns=col)
print (df)
   id  x, single room  x, double room  y, single room  y, double room
0   1               1               1               1               1
#create tuples from MultiIndex
a = df.columns.str.split(', ', expand=True).values
print (a)
[('id', nan) ('x', 'single room') ('x', 'double room') ('y', 'single room')
 ('y', 'double room')]

#swap values in NaN and replace NAN to ''
df.columns = pd.MultiIndex.from_tuples([('', x[0]) if pd.isnull(x[1]) else x for x in a])
print (df)
               x                       y            
  id single room double room single room double room
0  1           1           1           1           1

旧解决方案:

a = pd.DataFrame(df.columns.str.rsplit(', ', expand=True).values.tolist())
mask = a[1].isnull()
a.loc[mask, [0,1]] = a.loc[mask, [1,0]].values
a[0] = a[0].fillna('')
df.columns = a.set_index([0,1]).index
df.columns.names = ('', '')