替换类似类别的列中的字符串,映射到python中的新列

时间:2018-03-31 18:11:47

标签: python pandas dataframe statistics apply

我有一个类似于以下

的现有数据框(coffee_directions_df)
coffee_directions_df

Utterance                         Frequency   

Directions to Starbucks           1045
Directions to Tullys              1034
Give me directions to Tullys      986
Directions to Seattles Best       875
Show me directions to Dunkin      812
Directions to Daily Dozen         789
Show me directions to Starbucks   754
Give me directions to Dunkin      612
Navigate me to Seattles Best      498
Display navigation to Starbucks   376
Direct me to Starbucks            201

DF显示人们的言语和话语的频率。

即,“前往星巴克的路线”被发出1045次。

我试图找出如何将coffee_directions_df.Utterance列中的“Starbucks”,“Tullys”,“Seattles Best”等类似单词替换成一个字符串,例如“Coffee”。我已经看到了类似的答案,提出了一个字典,例如以下,但我还没有成功。

{'Utterance':['Starbucks','Tullys','Seattles Best'],
      'Combi_Utterance':['Coffee','Coffee','Coffee','Coffee']}

{'Utterance':['Dunkin','Daily Dozen'],
      'Combi_Utterance':['Donut','Donut']}

{'Utterance':['Give me','Show me','Navigate me','Direct me'],
      'Combi_Utterance':['V_me','V_me','V_me','V_me']}

所需的输出如下:

coffee_directions_df

Utterance                         Frequency  Combi_Utterance
Directions to Starbucks           1045       Directions to Coffee
Directions to Tullys              1034       Directions to Coffee
Give me directions to Tullys      986        V_me to Coffee
Directions to Seattles Best       875        Directions to Coffee
Show me directions to Dunkin      812        V_me to Donut
Directions to Daily Dozen         789        Directions to Donut
Show me directions to Starbucks   754        V_me to Coffee
Give me directions to Dunkin      612        V_me to Donut
Navigate me to Seattles Best      498        V_me to Coffee
Display navigation to Starbucks   376        Display navigation to Coffee
Direct me to Starbucks            201        V_me to Coffee

最终,我希望能够使用我必须生成最终输出的代码。

df = (df.set_index('Frequency')['Utterance']
        .str.split(expand=True)
        .stack()
        .reset_index(name='Words')
        .groupby('Words', as_index=False)['Frequency'].sum()
        )

print (df)
         Words  Frequency
0   Directions       6907
1         V_me       3863
2        Donut       2213
3       Coffee       5769
4        Other        376

谢谢!

1 个答案:

答案 0 :(得分:1)

以下是一种方法。根据您之前的问题,我选择使用mAWSLambdaClient.InvokeAsync(lambdaRequest)代替func segueToSecondViewController() { let secondViewController = SecondViewController() self.present(secondViewController, animated: true, completion: nil) } 来计算您的计数逻辑。

所需的输入采用映射字典collections.Counter的形式。我们将此应用于pandas系列中字符串的子字符串。

rep_dict

<强>结果

df['Utterance']