输入数据框
data = {
'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41],
np.nan, np.nan, np.nan, np.nan,np.nan,[69, 68, 7],]
}
df = pd.DataFrame.from_dict(data)
print (df)
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 NaN
3 557 NaN
4 557 NaN
5 104 NaN
6 581 NaN
7 69 [69, 68, 7]
输出数据框,
data = {
'id' :[70,70,1148,557,557,104,581,69],
'r_id' : [[70,34, 44, 23, 11, 71], [70, 53, 33, 73, 41],
[1148], [557], [557], [104],[581],[69, 68, 7]]
}
df = pd.DataFrame.from_dict(data)
print (df)
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
我希望目标列r_id和列表列的源列ID不是列表,请参见stackoverflow中的以下链接, python-pandas-replace-nan-in-one-column 还尝试了以下数据data_merge_rel.RELATED_DEVICE.fillna(data_merge_rel.DF0_Desc_Label_i.to_list(),inplace = True)
答案 0 :(得分:2)
(df.explode('r_id').ffill(axis=1).reset_index().groupby(['index','id'],sort=False).agg(list)
.reset_index(1))
id r_id
index
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
答案 1 :(得分:2)
我们可以使用list_comprehension
+ Series.fillna
。
首先,我们将所有id
值都转换为list
类型的列表。
然后,我们用列表值替换NaN
:
df['temp'] = [[x] for x in df['id']]
df['r_id'] = df['r_id'].fillna(df['temp'])
df = df.drop(columns='temp')
或使用apply
一行(感谢 r.ook )
df['r_id'] = df['r_id'].fillna(df['id'].apply(lambda x: [x]))
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
答案 2 :(得分:1)
您可以将列ID转换为一个数组,添加一个维度,然后列出它,并使用如下系列生成fillna
:
df['r_id'] = df['r_id'].fillna(pd.Series(df.id.to_numpy()[:,None].tolist(), index=df.index))
print (df)
id r_id
0 70 [70, 34, 44, 23, 11, 71]
1 70 [70, 53, 33, 73, 41]
2 1148 [1148]
3 557 [557]
4 557 [557]
5 104 [104]
6 581 [581]
7 69 [69, 68, 7]
或者如果您没有很多nan
,则可能值得在执行任何操作之前只选择这些行:
mask_na = df.r_id.isna()
df.loc[mask_na, 'r_id'] = pd.Series(df.loc[mask_na,'id'].to_numpy()[:,None].tolist(),
index=df[mask_na].index)
答案 3 :(得分:1)
我认为anky_91的答案会更快,但是您也可以尝试以下方法:
"x-qpid-dlq-enable"
输出:
df['r_id'] = np.where(df['r_id'].isnull(),
df['id'].apply(lambda x: [x]),
df['r_id'])