按特定的链接列分组到其他列熊猫

时间:2019-08-15 04:27:45

标签: python pandas pandas-groupby

我有一个具有4个功能的数据框,即user_id,comment_id,reply_to_id和comment_text。我的问题是按链接到comment_id的reply_id分组。该逻辑的主要目的是识别所有根注释的分支注释。通过这种方式,我可以识别所有的根注释和分支注释。 (我欢迎您提出任何解决此问题的建议,如果您有其他建议来解决此类问题,也可以)

表格:

user_id |  comment_id | reply_to_id | comment_text
123     | 8           |             | How are you?
456     | 9           |             | May I help you?
1256    | 10          | 8           | I am good. What about you? 
6543    | 11          |             | Weather is not good today
234     | 12          | 9           | Thank you, I will manage

我希望将所有comment_id和reply_to_id分组。输出应如下所示:

user_id |  comment_id | reply_to_id | comment_text
123     | 8           |             | How are you?
1256    | 10          | 8           | I am good. What about you? 
456     | 9           |             | May I help you?
234     | 12          | 9           | Thank you, I will manage
6543    | 11          |             | Weather is not good today

1 个答案:

答案 0 :(得分:2)

设置

df = pd.DataFrame({'user_id': {0: 123, 1: 456, 2: 1256, 3: 6543, 4: 234},
 'comment_id': {0: 8, 1: 9, 2: 10, 3: 11, 4: 12},
 'reply_to_id': {0: '', 1: '', 2: '8', 3: '', 4: '9'},
 'comment_text': {0: ' How are you?',
  1: ' May I help you?',
  2: ' I am good. What about you? ',
  3: ' Weather is not good today',
  4: ' Thank you, I will manage'}})

您可能可以使用temp列对此类内容进行尝试:

(
    df.assign(sort_key=df.apply(lambda x: int(x.comment_id) if x.reply_to_id=='' else int(x.reply_to_id), axis=1))
    .sort_values(by='sort_key')
    .drop('sort_key', 1)
)

    user_id comment_id  reply_to_id comment_text
0   123     8                       How are you?
2   1256    10          8           I am good. What about you?
1   456     9                       May I help you?
4   234     12          9           Thank you, I will manage
3   6543    11                      Weather is not good today