R:用最常见的变体替换字符串

时间:2019-06-18 14:25:59

标签: r tidyverse recode

我正在寻求标准化一组手动输入的字符串,以便:

index   fruit
1   Apple Pie
2   Apple Pie.
3   Apple. Pie
4   Apple Pie
5   Pear

应如下所示:

index   fruit
1   Apple Pie
2   Apple Pie
3   Apple Pie
4   Apple Pie
5   Pear

在我的用例中,按phonetic声音进行分组是可以的,但是我缺少有关如何用最常见的字符串替换最不常见的字符串的文章。

library(tidyverse)  
library(stringdist)

index <- seq(1,5,1)
fruit <- c("Apple Pie", "Apple Pie.", "Apple. Pie", "Apple Pie", "Pear")

df <- data.frame(index, fruit) %>%
  mutate(grouping = phonetic(fruit)) %>%
  add_count(fruit) %>%
  # Missing Code
  select(index, fruit)

3 个答案:

答案 0 :(得分:2)

我们可以使用A<-foreach(i=(1:748),.combine = cbind) %do%{ reg=lm(Non.Over.Real.Var$Tad[1:(240+i- 1),]~Non.Over.Real.Var$Varlag[1:(240+i-1),]) pred<-predict(reg,Non.Over.Real.Var$Tad[(240+i),]) return(pred) } 删除str_remove

.

如果我们需要使用library(dplyr) library(stringr) data.frame(index, fruit) %>% mutate(fruit = str_remove(fruit, "\\.")) # index fruit #1 1 Apple Pie #2 2 Apple Pie #3 3 Apple Pie #4 4 Apple Pie #5 5 Pear 并找到最频繁的值

phonetic

答案 1 :(得分:2)

听起来像您需要group_by分组,然后选择最频繁的(模式)项目

df%>%mutate(grouping = phonetic(fruit))%>%
     group_by(grouping)%>%
     mutate(fruit = names(which.max(table(fruit))))

# A tibble: 5 x 3
# Groups:   grouping [2]
  index     fruit grouping
  <dbl>    <fctr>    <chr>
1     1 Apple Pie     A141
2     2 Apple Pie     A141
3     3 Apple Pie     A141
4     4 Apple Pie     A141
5     5      Pear     P600

答案 2 :(得分:1)

另一种方式可能是:

fruit %>%
 enframe() %>%
 mutate(grouping = phonetic(fruit)) %>%
 add_count(value, grouping) %>%
 group_by(grouping) %>%
 mutate(value = value[match(max(n), n)]) %>%
 select(-n) %>%
 ungroup()

   name value     grouping
  <int> <chr>     <chr>   
1     1 Apple Pie A141    
2     2 Apple Pie A141    
3     3 Apple Pie A141    
4     4 Apple Pie A141    
5     5 Pear      P600