Question

我有df，有时（并非总是）行带有空单元格，但其中一个除外：

var object1 = new myObject(1,2,3);

我要完成的工作是合并具有空列“ A”和“ B”的行与上一行。输出应如下所示：

 |    A  |      B|    C|
0|  white|    one|    1|
1|       |       |    2|
2|  blue |    two|    3|
3|       |       |    4|
4|       |       |    5|
5|  black|  three|    6|
6|  brown|   four|    7|

可能的组合是：

没有相邻行且单元格为空（第5、6行）
一行相邻的单元格为空（行0、1）
相邻的一行以上有空单元格（第2-4行）

在每隔一行都有空单元格的简单情况下，我可以使用以下方法进行管理：

 |    A  |      B|     C|
0|  white|    one|   1 2|
2|  blue |    two| 3 4 5|
5|  black|  three|     6|
6|  brown|   four|     7|

但是我无法弄清其他/合并的情况。

感谢帮助。

更新：

尝试提供解决方案df后出现的结果也有类似情况：

df.groupby(np.arange(len(df))//2).sum()

根据此类数据提供的解决方案得出以下结果：

 |      A|    B|    C
0|  white|  one|    1
1|       |     |    2
2|  white|  one|    3
3|       |     |    4
4|       |     |    5
5|  white|  one|    6
6|  white|  one|    7

预期的时间应该是这样的：

 |      A|    B|       C
0|       |     |   2 4 5
1|  white|  one| 1 3 6 7

Answer 1

使用类似的东西：

df.groupby(df.A.ffill()).agg({'B':'first','C':lambda x: ','.join(map(str,x))}).reset_index()

更好（多亏@piRSquared）：

df.astype({'C': str}).ffill().groupby(['A', 'B']).C.apply(' '.join).reset_index()

如果要将订单保留为原始df，请尝试：

m=df.groupby(df.A.ffill()).agg({'B':'first','C':lambda x: ','.join(map(str,x))}).\
                                                reindex(df.A.dropna().unique())
m=m.reset_index()
print(m)

       A      B      C
0  white    one    1,2
1   blue    two  3,4,5
2  black  three      6
3  brown   four      7

注意，在执行此操作之前，请用np.nan替换空格

编辑：

根据您的更新，您可以执行以下操作：

df=df.replace(r'^\s*$', np.nan, regex=True) #to replace whitespaces to NaN(optional)
new_df=(df.astype({'C': str}).groupby(df['A'].notnull().cumsum())
      .agg({'A':'first','B':'first','C':' '.join}).reset_index(drop=True))
print(new_df)

         A      B      C
0    white    one    1 2
1    white    one  3 4 5
2    white    one      6
3    white    one      7

根据相邻的行单元格值合并行

1 个答案: