pandas数据框将列值追加到另一个具有列表元素的pandas列

时间:2019-12-13 11:53:35

标签: python pandas

输入数据框如下

data = {

's_id' :[5,7,26,70.0,55,71.0,8.0,'nan','nan',4],
'r_id' : [[34, 44, 23, 11, 71], [53, 33, 73, 41], [17], [10, 31], [17], [75, 8],[7],[68],[50],[]]

}

df = pd.DataFrame.from_dict(data)
df
Out[240]: 
  s_id                  r_id
0    5  [34, 44, 23, 11, 71]
1    7      [53, 33, 73, 41]
2   26                  [17]
3   70              [10, 31]
4   55                  [17]
5   71               [75, 8]
6    8                   [7]
7  nan                  [68]
8  nan                  [50]
9    4                    []

期望的数据帧

data = {

's_id' :[5,7,26,70.0,55,71.0,8.0,'nan','nan',4],
'r_id' : [[5,34, 44, 23, 11, 71], [7,53, 33, 73, 41], [26,17], [70,10, 31], [55,17], [71,75, 8],[8,7],[68],[50],[4]]

}
df = pd.DataFrame.from_dict(data)
df
Out[241]: 
  s_id                     r_id
0    5  [5, 34, 44, 23, 11, 71]
1    7      [7, 53, 33, 73, 41]
2   26                 [26, 17]
3   70             [70, 10, 31]
4   55                 [55, 17]
5   71              [71, 75, 8]
6    8                   [8, 7]
7  nan                     [68]
8  nan                     [50]
9    4                      [4]

需要用S_id中的元素作为r_id列表列中的第一个元素填充列表列,我也有nan值,其中有些显示为浮点列,谢谢。

我尝试了以下方法,

df['r_id'] = df["s_id"].apply(lambda x : x.append(df['r_id']) )

df['r_id'] = df["s_id"].apply(lambda x : [x].append(df['r_id'].values.tolist()))

1 个答案:

答案 0 :(得分:1)

如果nan缺少值,请使用apply,将data = { 's_id' :[5,7,26,70.0,55,71.0,8.0,np.nan,np.nan,4], 'r_id' : [[34, 44, 23, 11, 71], [53, 33, 73, 41], [17], [10, 31], [17], [75, 8],[7],[68],[50],[]] } df = pd.DataFrame.from_dict(data) print (df) f = lambda x : [int(x["s_id"])] + x['r_id'] if pd.notna(x["s_id"]) else x['r_id'] df['r_id'] = df.apply(f, axis=1) print (df) s_id r_id 0 5.0 [5, 34, 44, 23, 11, 71] 1 7.0 [7, 53, 33, 73, 41] 2 26.0 [26, 17] 3 70.0 [70, 10, 31] 4 55.0 [55, 17] 5 71.0 [71, 75, 8] 6 8.0 [8, 7] 7 NaN [68] 8 NaN [50] 9 4.0 [4] 的值转换为一个元素列表,然后转换为整数,并过滤掉误码值:

NaN

另一个想法是过滤器列并将功能应用于非m = df["s_id"].notna() f = lambda x : [int(x["s_id"])] + x['r_id'] df.loc[m, 'r_id'] = df[m].apply(f, axis=1) print (df) s_id r_id 0 5.0 [5, 34, 44, 23, 11, 71] 1 7.0 [7, 53, 33, 73, 41] 2 26.0 [26, 17] 3 70.0 [70, 10, 31] 4 55.0 [55, 17] 5 71.0 [71, 75, 8] 6 8.0 [8, 7] 7 NaN [68] 8 NaN [50] 9 4.0 [4] 的行:

 Crashes.SendingErrorReport += async (sender, e) =>
        {
            // Your code, e.g. to present a custom UI.
        };