具有此数据框:
A B C D
Train Superfast 10 20
NaN Convernient NaN NaN
NaN Newest model NaN NaN
NaN Year 2002/099 NaN NaN
Car Fastest 20 30
NaN Can be more fast NaN NaN
NsN Year/2020/AYD NaN NaN
是否可以将列B
中的行上移到其他列中具有剩余值的行?
A B C D
Train Superfast Convernient Newest model Year 2002/099 10 20
Car Fastest Can be more fast Year/2020/AYD 20 30
答案 0 :(得分:3)
让我们使用cumsum
来识别区块和分组依据:
blocks = df['C'].notna().cumsum()
agg_dict = {col:' '.join if col=='B' else 'first' for col in df}
df.groupby(blocks).agg(agg_dict).reset_index(drop=True)
输出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10.0 20.0
1 Car Fastest Can be more fast Year/2020/AYD 20.0 30.0
答案 1 :(得分:1)
仅使用numpy
的有点复杂的解决方案,但是对于大数据却非常快速地工作:
import pandas as pd, numpy as np, math
df = pd.DataFrame([
['Train', 'Superfast', 10, 20],
[np.nan, 'Convernient', np.nan, np.nan],
[np.nan, 'Newest model', np.nan, np.nan],
[np.nan, 'Year 2002/099', np.nan, np.nan],
['Car', 'Fastest', 20, 30],
[np.nan, 'Can be more fast', np.nan, np.nan],
[np.nan, 'Year/2020/AYD', np.nan, np.nan],
], columns = ['A', 'B', 'C', 'D'])
a = df.values
i = np.append(np.flatnonzero(~(a[:, 0] != a[:, 0])), a.shape[0])
b = a[i[:-1], :]
diffs = np.diff(i)
maxs = np.amax(diffs)
c = np.zeros([i.shape[0], maxs], dtype = np.str_)
begs, ends = i[:-1], i[1:]
for j in range(1, maxs):
chosen = begs + j < ends
b[chosen, 1] += ' ' + a[begs[chosen] + j, 1]
df = pd.DataFrame(b, columns = df.columns.values.tolist())
print(df)
代码输出:
A B C D
0 Train Superfast Convernient Newest model Year 2002/099 10 20
1 Car Fastest Can be more fast Year/2020/AYD 20 30