我有一个具有4个功能的数据框,即user_id,comment_id,reply_to_id和comment_text。我的问题是按链接到comment_id的reply_id分组。该逻辑的主要目的是识别所有根注释的分支注释。通过这种方式,我可以识别所有的根注释和分支注释。 (我欢迎您提出任何解决此问题的建议,如果您有其他建议来解决此类问题,也可以)
表格:
user_id | comment_id | reply_to_id | comment_text
123 | 8 | | How are you?
456 | 9 | | May I help you?
1256 | 10 | 8 | I am good. What about you?
6543 | 11 | | Weather is not good today
234 | 12 | 9 | Thank you, I will manage
我希望将所有comment_id和reply_to_id分组。输出应如下所示:
user_id | comment_id | reply_to_id | comment_text
123 | 8 | | How are you?
1256 | 10 | 8 | I am good. What about you?
456 | 9 | | May I help you?
234 | 12 | 9 | Thank you, I will manage
6543 | 11 | | Weather is not good today
答案 0 :(得分:2)
设置
df = pd.DataFrame({'user_id': {0: 123, 1: 456, 2: 1256, 3: 6543, 4: 234},
'comment_id': {0: 8, 1: 9, 2: 10, 3: 11, 4: 12},
'reply_to_id': {0: '', 1: '', 2: '8', 3: '', 4: '9'},
'comment_text': {0: ' How are you?',
1: ' May I help you?',
2: ' I am good. What about you? ',
3: ' Weather is not good today',
4: ' Thank you, I will manage'}})
您可能可以使用temp列对此类内容进行尝试:
(
df.assign(sort_key=df.apply(lambda x: int(x.comment_id) if x.reply_to_id=='' else int(x.reply_to_id), axis=1))
.sort_values(by='sort_key')
.drop('sort_key', 1)
)
user_id comment_id reply_to_id comment_text
0 123 8 How are you?
2 1256 10 8 I am good. What about you?
1 456 9 May I help you?
4 234 12 9 Thank you, I will manage
3 6543 11 Weather is not good today