查找每个目标类别中句子中每个单词的出现次数

时间:2018-02-13 05:23:11

标签: python pandas dataframe

我有类似的东西。

Sentence                                        Target
We regret to inform you about the result.        1
We are glad to inform you about the result.      2
We would like to inform you about the result.   3
We are surprised to see the result.              4

我想要一个看起来像这样的字数

Word    Target 1    Target 2    Target 2    Target 4
Result     1           1            1           1
Inform     1           1            1           0
Surprised   0           0           0           1

......等等。我该怎么做?

1 个答案:

答案 0 :(得分:1)

您需要

  1. 删除标点符号并小写数据
  2. 拆分空白
  3. stack创建一个系列
  4. groupby Target
  5. 找到每个目标的value_counts个字词
  6. unstack您所需输出的结果
  7. df.Sentence.str.replace('[^\w\s]', '')\
      .str.lower()\
      .str.split(expand=True)\
      .set_index(df.Target)\
      .stack()\
      .groupby(level=0)\
      .value_counts()\
      .unstack(0, fill_value=0)\
      .add_prefix('Target ')
    
    
    Target     Target 1  Target 2  Target 3  Target 4
    about             1         1         1         0
    are               0         1         0         1
    glad              0         1         0         0
    inform            1         1         1         0
    like              0         0         1         0
    regret            1         0         0         0
    result            1         1         1         1
    see               0         0         0         1
    surprised         0         0         0         1
    the               1         1         1         1
    to                1         1         1         1
    we                1         1         1         1
    would             0         0         1         0
    you               1         1         1         0