如何将ifelse语句与grep结合起来?

时间:2017-09-13 06:48:03

标签: r

我有一份食物清单,我需要为其创建一个总体类别列。我的食物来源的一个例子如下:

FruitSources <- c("Apple Juice", "Apple Puree", "Apple Pieces", "Orange Juice", "Orange Pieces", "Banana Smoothie", "Banana Pieces", 
                  "Apple & Blackcurrant Juice", "Mango & Banana Smoothie", "Watermelon, Apple & Orange Juice")

我希望仅为FruitSources中的每个条目使用第一个单词创建此类别,而不是整行。例如,我的预期输出是:

Categories <- c("Apple", "Apple", "Apple", "Orange", "Orange", "Banana", "Banana", "Apple", "Other", "Other")

虽然某些条目的&符号可能会导致Other,但我更倾向于使用仅使用第一个单词的解决方案。在上面的例子中,除了苹果,橙子和香蕉之外的任何水果都会产生“其他”。一个粗略的方法是:

Output <- ifelse(FruitSources=='Apple', 'Apple',
                 ifelse(FruitSources=='Banana', 'Banana',
                        ifelse(FruitSources=='Orange', 'Orange', 'Other')))

但是,上述内容不会仅检测第一个单词,而是搜索整个字符串。这导致:

Output
 [1] "Other" "Other" "Other" "Other" "Other" "Other" "Other" "Other" "Other" "Other"

之前我使用过嵌套的ifelse语句,但是可以将它们与grep结合使用并完成上述操作吗?

2 个答案:

答案 0 :(得分:3)

假设所有包含&,的字符串都应该包含&#34;其他&#34;正如预期的那样,所有其他人,第一个单词,然后使用grepl生成基于&ifelseword(来自stringr)的逻辑向量第一个单词,如果没有&,或者返回为&#34;其他&#34;

library(stringr)
ifelse(grepl("[&,]", FruitSources), "Other", word(FruitSources, 1))
#[1] "Apple"  "Apple"  "Apple"  "Orange" "Orange" "Banana" 
#[7] "Banana" "Other"  "Other"  "Other" 

如果这是基于单个&#39; Fruit&#39; vs multiple&#39; Fruits&#39;,然后一个选项是str_count来生成逻辑索引

ifelse(str_count(FruitSources, "\\b(Apple|Orange|Banana|Mango|Blackcurrant)\\b")==1, 
                     word(FruitSources, 1), "Other")
#[1] "Apple"  "Apple"  "Apple"  "Orange" "Orange" "Banana" 
#[7] "Banana" "Other"  "Other"  "Other" 

更新

如果这是基于第一个输入词&#39; Apple&#39;,&#39; Orange&#39;或者&#39; Banana&#39;

ifelse(grepl("^(Apple|Orange|Banana)", FruitSources),  word(FruitSources, 1), "Other")
#[1] "Apple"  "Apple"  "Apple"  "Orange" "Orange" "Banana" 
#[7] "Banana" "Apple"  "Other"  "Other" 

答案 1 :(得分:0)

这是一个在基数R中使用正则表达式的解决方案。

它基于两个步骤。首先,在第一个位置提取关键字,并用空字符串替换其他字符串。

tmp <- sub("^(?:(Apple|Orange|Banana)|.?).*", "\\1", FruitSources)
# [1] "Apple"  "Apple"  "Apple"  "Orange" "Orange" "Banana" "Banana" "Apple"  ""       ""

其次,用"Other"替换空字符串。

sub("^$", "Other", tmp)
# [1] "Apple"  "Apple"  "Apple"  "Orange" "Orange" "Banana" "Banana" "Apple"  "Other"  "Other" 

在一行中:

sub("^$", "Other", sub("^(?:(Apple|Orange|Banana)|.?).*", "\\1", FruitSources))