Question

我有一个数据框，其中有一列（colD）是根据从另一列（colC）中提取的关键字创建的。我使用了包含所有关键字（“ abc”，“ xyz”，“ efg”，“ rst”）的列表，但是如果关键字未出现在colC中，则不会被记录在colD中。关键字在其他两列（colA和colB）中也可能存在或不存在。我想知道是否有一种方法可以将colA和/或colB中的值添加到colD中的相应列表中（如果它们尚不存在）？

当前状态：

  colA colB           colC   colD
0  abc  NaN   hi there:abc  [abc]
1  xyz  NaN   blahblahblah     []
2  efg  rst  text rst text  [rst]

所需的输出：

   colA colB          colC        colD
0   abc  NaN  hi there:abc       [abc]
1   xyz  NaN  blahblahblah       [xyz]
2   efg  rst text rst text  [rst, efg]

Answer 1

IIUC，首先stack，将要添加到list的列，然后groupby level并获得list

s=df[['colA','colB']].stack().groupby(level=0).apply(list)
#here using the set get the different and adding the different back the colD
df.colD=[y+list(set(x)-set(y))for x , y in zip(s,df.colD)]
df
Out[118]: 
  colA colB        colD
0  abc  NaN       [abc]
1  xyz  NaN       [xyz]
2  efg  rst  [rst, efg]

熊猫-将值从一列追加到新列中的列表，如果新列的列表中不存在值

1 个答案: