将多行按列值组合为一行,并根据连接的行数将其拆分为多个数据框(对于多列)

时间:2019-10-19 05:09:56

标签: python pandas

这是此问题的后续内容:Concatenate several rows into one row by column value, and split resulting dataframe into several dataframes based on number of concatinated rows

其中显示了在要合并一列和另外一列的情况下如何合并行。

我现在正在寻找针对多列的情况的解决方案,但我仍然想基于一个列合并行。

我希望如何处理:首先列出一种类型的所有列,然后以与第一次相同的顺序列出另一种类型的列。

这是一个最小的例子

data = [['tom', 'ca', 2], ['ni2ck', 'ma', 2], ['j3uli', 'ny', 4] , ['nic4k', 'ma', 4], ['jul5i', 'ny', 4] , ['nic6k', 'ma', 7], ['ju7li', 'ny', 7] , ['nic8k', 'ma', 7], ['ju9li', 'ny', 7] , ['nic1k', 'ma', 8], ['car', 'ny', 8]]
df = pd.DataFrame(data, columns = ['Name', 'Location', 'Age']) 
df 

结果是

Name    Location    Age
0   tom ca  2
1   ni2ck   ma  2
2   j3uli   ny  4
3   nic4k   ma  4
4   jul5i   ny  4
5   nic6k   ma  7
6   ju7li   ny  7
7   nic8k   ma  7
8   ju9li   ny  7
9   nic1k   ma  8
10  car ny  8

这将是理想的结果

    Name    Name    Location    Location    Age
0   tom ni2ck   ca  ma  2
1   nic1k   car ma  ny  8


Name    Name    Name    Location    Location    Location    Age
0   j3uli   nic4k   jul5i   ny  ma  ny  4


Name    Name    Name    Name    Location    Location    Location    Location    Age
0   nic6k   ju7li   nic8k   ju9li   ma  ny  ma  ny  7

重要的是,正确的位置应与相应名称的顺序相同。

1 个答案:

答案 0 :(得分:1)

从@Wen解决方案进行开发。代替pivot,使用pivot_table

df['New']=df.groupby('Age').cumcount()
s= df.pivot_table(index='Age',columns='New',
                  values=['Name', 'Location'], 
                  aggfunc='first').reindex(['Name', 'Location'], axis=1, level=0)
s.columns = s.columns.map('{0[0]}{0[1]}'.format)

l=[y.dropna(1).reset_index() for _ , y in s.groupby(s.isnull().sum(1))]

In [499]: l[0]
Out[499]:
   Age  Name0  Name1  Name2  Name3 Location0 Location1 Location2 Location3
0    7  nic6k  ju7li  nic8k  ju9li        ma        ny        ma        ny

In [500]: l[1]
Out[500]:
   Age  Name0  Name1  Name2 Location0 Location1 Location2
0    4  j3uli  nic4k  jul5i        ny        ma        ny

In [501]: l[2]
Out[501]:
   Age  Name0  Name1 Location0 Location1
0    2    tom  ni2ck        ca        ma
1    8  nic1k    car        ma        ny

如果要保留多索引列,请跳过列上的map命令

df['New']=df.groupby('Age').cumcount()
s= df.pivot_table(index='Age',columns='New',
                  values=['Name', 'Location'], 
                  aggfunc='first').reindex(['Name', 'Location'], axis=1, level=0)

l=[y.dropna(1).reset_index() for _ , y in s.groupby(s.isnull().sum(1))]

In [544]: l[0]
Out[544]:
    Age   Name                      Location
New          0      1      2      3        0   1   2   3
0     7  nic6k  ju7li  nic8k  ju9li       ma  ny  ma  ny

In [545]: l[1]
Out[545]:
    Age   Name               Location
New          0      1      2        0   1   2
0     4  j3uli  nic4k  jul5i       ny  ma  ny

In [546]: l[2]
Out[546]:
    Age   Name        Location
New          0      1        0   1
0     2    tom  ni2ck       ca  ma
1     8  nic1k    car       ma  ny