Question

在下面的地方应该使用值_0,_1 空格，我无法在此处创建带有空格的示例，因为熊猫没有允许他们。它们存在于我从中读取的excel文件中。

1,2,3不是在任何解决方案中都可以依赖的值，它们只是填充此示例。

我想做的是将一个附加的标题转换为一列，以便只有一个数据标题。

一些示例数据：

ef = pd.DataFrame({
  '_0' : ['loc', 1, 2, 3],
  'a' :  ['x', 1, 2, 3],
  '_1' : ['y', 1, 2, 3],
  '_2' : ['z', 1, 2, 3],
  'b' :  ['x', 1, 2, 3],
  '_3' : ['y', 1, 2, 3],
  '_4' : ['z', 1, 2, 3],
  'c' :  ['x', 1, 2, 3],
  '_5' : ['y', 1, 2, 3],
  '_6' : ['z', 1, 2, 3],
})

哪个输出

In [98]: ef
Out[98]:
    _0  a _1 _2  b _3 _4  c _5 _6
0  loc  x  y  z  x  y  z  x  y  z
1    1  1  1  1  1  1  1  1  1  1
2    2  2  2  2  2  2  2  2  2  2
3    3  3  3  3  3  3  3  3  3  3

没有下划线是

        a        b        c      
0  loc  x  y  z  x  y  z  x  y  z
1    1  1  1  1  1  1  1  1  1  1
2    2  2  2  2  2  2  2  2  2  2
3    3  3  3  3  3  3  3  3  3  3

我想将其放入表格

loc  type  x  y  z  
  1   a    1  1  1  
  1   b    1  1  1  
  1   c    1  1  1  
  2   a    2  2  2  
  2   b    2  2  2  
  2   c    2  2  2  
  3   a    3  3  3  
  3   b    3  3  3  
  3   c    3  3  3

如何使用熊猫来做到这一点？

Answer 1

我认为最好将列转换为MultiIndex，将第一列loc转换为read_excel的索引：

df = df.read_excel(file, header=[0,1], index_col=0)

然后可能的列名可能会被unammed值更改，因此有必要在以后进行处理。

数据解决方案：

ef = pd.DataFrame({
  '_0' : ['loc', 1, 2, 3],
  'a' :  ['x', 1, 2, 3],
  '_1' : ['y', 1, 2, 3],
  '_2' : ['z', 1, 2, 3],
  'b' :  ['x', 1, 2, 3],
  '_3' : ['y', 1, 2, 3],
  '_4' : ['z', 1, 2, 3],
  'c' :  ['x', 1, 2, 3],
  '_5' : ['y', 1, 2, 3],
  '_6' : ['z', 1, 2, 3],
})

#added spaces for values with `_`
ef.columns = np.where(ef.columns.str.contains('_'), ' ', ef.columns).tolist()

#create MultiIndex by set first row to columns
ef.columns = [ef.columns, ef.iloc[0]]
#remove first row by iloc and set index by first column - it is MultiIndex, so used tuple
ef = ef.iloc[1:].set_index([(' ', 'loc')])
#created tuples in index - removed tuples
ef.index = ef.index.str[0]
#set index name later for new column
ef.index.name='loc'

#converted MultiIndex to df for forward filling spaces converted to NaNs
df = pd.DataFrame(ef.columns.tolist(), columns=['type', 'c2'])
df['type'] = df['type'].mask(df['type'] == ' ').ffill()
print (df)
  type c2
0    a  x
1    a  y
2    a  z
3    b  x
4    b  y
5    b  z
6    c  x
7    c  y
8    c  z

#set MultiIndex to ef
ef.columns = [df['c2'], df['type']]

#last possible reshape
ef = ef.stack().reset_index().rename_axis(None, axis=1)
print (ef)
   loc type  x  y  z
0    1    a  1  1  1
1    1    b  1  1  1
2    1    c  1  1  1
3    2    a  2  2  2
4    2    b  2  2  2
5    2    c  2  2  2
6    3    a  3  3  3
7    3    b  3  3  3
8    3    c  3  3  3

使用熊猫将其他标题转换为列

1 个答案: