熊猫合并列中的行

时间:2020-09-30 16:03:00

标签: python pandas

具有此数据框:

A      B                 C    D 
Train  Superfast         10   20 
NaN    Convernient       NaN NaN
NaN    Newest model      NaN NaN
NaN    Year 2002/099     NaN NaN
Car    Fastest           20   30
NaN    Can be more fast  NaN NaN
NsN    Year/2020/AYD     NaN NaN

是否可以将列B中的行上移到其他列中具有剩余值的行?

A      B                                                  C  D 
Train  Superfast Convernient Newest model Year 2002/099  10 20 
Car    Fastest Can be more fast Year/2020/AYD            20 30

2 个答案:

答案 0 :(得分:3)

让我们使用cumsum来识别区块和分组依据:

blocks = df['C'].notna().cumsum()

agg_dict = {col:' '.join if col=='B' else 'first' for col in df}

df.groupby(blocks).agg(agg_dict).reset_index(drop=True)

输出:

       A                                                 B     C     D
0  Train  Superfast Convernient Newest model Year 2002/099  10.0  20.0
1    Car            Fastest Can be more fast Year/2020/AYD  20.0  30.0

答案 1 :(得分:1)

仅使用numpy的有点复杂的解决方案,但是对于大数据却非常快速地工作:

Try running it online!

import pandas as pd, numpy as np, math

df = pd.DataFrame([
    ['Train', 'Superfast', 10, 20],
    [np.nan, 'Convernient', np.nan, np.nan],
    [np.nan, 'Newest model', np.nan, np.nan],
    [np.nan, 'Year 2002/099', np.nan, np.nan],
    ['Car', 'Fastest', 20, 30],
    [np.nan, 'Can be more fast', np.nan, np.nan],
    [np.nan, 'Year/2020/AYD', np.nan, np.nan],
], columns = ['A', 'B', 'C', 'D'])

a = df.values
i = np.append(np.flatnonzero(~(a[:, 0] != a[:, 0])), a.shape[0])
b = a[i[:-1], :]
diffs = np.diff(i)
maxs = np.amax(diffs)
c = np.zeros([i.shape[0], maxs], dtype = np.str_)

begs, ends = i[:-1], i[1:]
for j in range(1, maxs):
    chosen = begs + j < ends
    b[chosen, 1] += ' ' + a[begs[chosen] + j, 1]

df = pd.DataFrame(b, columns = df.columns.values.tolist())
print(df)

代码输出:

       A                                                 B   C   D
0  Train  Superfast Convernient Newest model Year 2002/099  10  20
1    Car            Fastest Can be more fast Year/2020/AYD  20  30