与Get column value if it matches another column value in the same table相同,但是使用Python / pandas而不是使用SQL,因为查询运行时间太长。
我有一个df,
Id | replyId | commentID_parentID | usernameChannelId | channelId
1 | NULL | NULL | a | g
2 | NULL | NULL | b | k
NULL | 1.k | 1 | a | p
NULL | 1.p | 1 | c | i
3 | NULL | NULL | d | h
NULL | 2.k | 2 | g | g
和带有以下通道的表:
我想知道哪个用户(userChannelId)回复了哪个用户。
因此,我在一行中添加评论,并检查是否:
Id == NULL? Then it's a reply -> get userChannelId where commentID_parentID == Id
Id != NULL? Then it's a main comment -> userChannelId replied to channelId
结果应该是:
userChannelId_Source | userChannelId_Target
a | g
b | k
a | a
c | a
g | b
注释“ d”没有其中commentID_parentID == ID的条目,因此被忽略了。
到目前为止,我的代码:
cm["usernameChannelId_reply"] = None
for row in cm.itertuples():
if cm.commentID_parentID is None: # comment is a main comment
cm.at[row.Index, 'usernameChannelId_reply'] = cm.channelId
else: # comment is a reply comment
temp = cm.loc[cm.Id == row.commentID_parentID]["usernameChannelId"][0]
#temp = cm.query("Id == commentID_parentID").head(1).loc[:, 'usernameChannelId']
print(temp)
if len(set(temp)) == 0:
print(0, row.Index)
#cm.at[row.Index, 'usernameChannelId_reply'] = temp
else:
cm.at[row.Index, 'usernameChannelId_reply'] = temp
但是我得到一个
KeyError:0
删除[0]打印件,例如:
997 UCOYb6iKhuCHKDwvd_iBnIBw名称:usernameChannelId,dtype:object
答案 0 :(得分:0)
IIUC,您想要将commentID_parentID中的值与关联到同一ID的usernameChannelId的值进行映射。您可以尝试:
#create the mapper
s_map = df.loc[df.Id.ne('NULL'), :].set_index(['Id'])['usernameChannelId']
# create the column by mapping the values where comment_parentID is not NULL, otherwise channelID
df['userChannelId_Target'] = np.where( df['commentID_parentID'].ne('NULL'),
df['commentID_parentID'].map(s_map), df['channelId'])
# see result
print (df[['usernameChannelId', 'userChannelId_Target' ]])
usernameChannelId userChannelId_Target
0 a g
1 b k
2 a a
3 c a
4 d h
5 g b