表格式(空单元格为空,列为:字段,维度)
field | dimension
-----------------
a |
b | abc
e | efg
| xyz
r | abc
| def
| xyz
所需格式:
field | dimension
-----------------
a | [nan]
b | [abc]
e | [efg, xyz]
r | [abc, def, xyz]
我试过了:
df.dimension = [df.dimension]
并且要在字段中找到每个空单元格的索引并与上面的行组合。但是,我得到了 -
ValueError:值的长度与索引的长度不匹配。
我还认为必须有比我接近它更好的方式。提前致谢
答案 0 :(得分:2)
使用:
df =(df.groupby(df['field'].ffill())['dimension']
.apply(lambda x: np.nan if x.isnull().all() else list(x))
.reset_index())
print (df)
field dimension
0 a NaN
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
df = (df[df['dimension'].notnull()].groupby(df['field'].ffill())['dimension']
.apply(list)
.reindex(pd.unique(df['field'].dropna()))
.reset_index())
print (df)
field dimension
0 a NaN
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
但如果列表中NaN
没有问题:
df =(df.groupby(df['field'].ffill())['dimension']
.apply(list)
.reset_index())
print (df)
field dimension
0 a [nan]
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]
答案 1 :(得分:1)
让我们试试:
df['field'] = df['field'].ffill()
df_out = df.groupby('field')['dimension'].apply(list).reset_index()
输出:
field dimension
0 a [nan]
1 b [abc]
2 e [efg, xyz]
3 r [abc, def, xyz]