Question

我正在尝试堆叠通过id列上的连接创建的熊猫数据框的列

df如下所示，左表和右表的列名都相似（不知道我是否应该简单地重命名它们，它将解决问题

class App extends Component {
  render() {
    return (
      <div className="App">
        <Switch>
          <Route
           path="/"
            exact
            match={ true }
            render={ () => <MainMenu/> }
          />
          <Route
            path="/category/:uniqueID"
            exact
            render={ () => <CategoryComponent childArray=[child1, child2, child3] /> }
          />
      />
        </Switch>
      </div>
    );
  }
}

我要将相同的id转换为上面的输出堆叠在另一个下面

例如对于id = 1

id  county cat brand month country cat brand month
1    GB    x1   xx    12    GB      x2  x1    08
2    GB    x2   xx1   12    GB      x2  x1    09

任何一种更简单的方法，我都尝试了以下方法，但是它不起作用

创建一个新的列调用row_index_number

df ['row_number'] = df.reset_index（）。index
有一个附加项并按行号排序

new = df [['id'，'county'，'cat'，'brand'，'month'，'row_number']]

old = df [['id'，'county'，'cat'，'brand'，'month'，'row_number']]

full = new.append（old）

full = full.sort_values（by = ['row_number']）

Answer 1

您可以使用cumcount来计算重复的列名，并通过分配嵌套列表来创建MultiIndex：

df = df.set_index('id') 

s = df.columns.to_series()
df.columns = [s.groupby(s).cumcount(), s]
print (df)
         0                       1                
   country cat brand month country cat brand month
id                                                
1       GB  x1    xx    12      GB  x2    x1     8
2       GB  x2   xx1    12      GB  x2    x1     9

print (df.columns)
MultiIndex(levels=[[0, 1], ['brand', 'cat', 'country', 'month']],
           labels=[[0, 0, 0, 0, 1, 1, 1, 1], [2, 1, 0, 3, 2, 1, 0, 3]])

然后致电stack：

df1 = df.stack(0)
print (df1)
     brand cat country  month
id                           
1  0    xx  x1      GB     12
   1    x1  x2      GB      8
2  0   xx1  x2      GB     12
   1    x1  x2      GB      9

然后通过loc选择：

print (df1.loc[1])
  brand cat country  month
0    xx  x1      GB     12
1    x1  x2      GB      8

print (df1.loc[2])
  brand cat country  month
0   xx1  x2      GB     12
1    x1  x2      GB      9

Answer 2

在转换的第2步中，在连接发生之前重命名列即可解决问题

添加了一个附加项并按行号排序

new = df[['id','county','cat','brand','month','row_number']]

old = df[['id','county_new','cat_new','brand_new','month_new','row_number']]

full = new.append(old)

full = full.sort_values(by = ['row_number'])

使用Pandas中的reset_index函数将列转换为行

2 个答案: