我正在寻求标准化一组手动输入的字符串,以便:
index fruit
1 Apple Pie
2 Apple Pie.
3 Apple. Pie
4 Apple Pie
5 Pear
应如下所示:
index fruit
1 Apple Pie
2 Apple Pie
3 Apple Pie
4 Apple Pie
5 Pear
在我的用例中,按phonetic声音进行分组是可以的,但是我缺少有关如何用最常见的字符串替换最不常见的字符串的文章。
library(tidyverse)
library(stringdist)
index <- seq(1,5,1)
fruit <- c("Apple Pie", "Apple Pie.", "Apple. Pie", "Apple Pie", "Pear")
df <- data.frame(index, fruit) %>%
mutate(grouping = phonetic(fruit)) %>%
add_count(fruit) %>%
# Missing Code
select(index, fruit)
答案 0 :(得分:2)
我们可以使用A<-foreach(i=(1:748),.combine = cbind) %do%{
reg=lm(Non.Over.Real.Var$Tad[1:(240+i- 1),]~Non.Over.Real.Var$Varlag[1:(240+i-1),])
pred<-predict(reg,Non.Over.Real.Var$Tad[(240+i),])
return(pred)
}
删除str_remove
.
如果我们需要使用library(dplyr)
library(stringr)
data.frame(index, fruit) %>%
mutate(fruit = str_remove(fruit, "\\."))
# index fruit
#1 1 Apple Pie
#2 2 Apple Pie
#3 3 Apple Pie
#4 4 Apple Pie
#5 5 Pear
并找到最频繁的值
phonetic
答案 1 :(得分:2)
听起来像您需要group_by
分组,然后选择最频繁的(模式)项目
df%>%mutate(grouping = phonetic(fruit))%>%
group_by(grouping)%>%
mutate(fruit = names(which.max(table(fruit))))
# A tibble: 5 x 3
# Groups: grouping [2]
index fruit grouping
<dbl> <fctr> <chr>
1 1 Apple Pie A141
2 2 Apple Pie A141
3 3 Apple Pie A141
4 4 Apple Pie A141
5 5 Pear P600
答案 2 :(得分:1)
另一种方式可能是:
fruit %>%
enframe() %>%
mutate(grouping = phonetic(fruit)) %>%
add_count(value, grouping) %>%
group_by(grouping) %>%
mutate(value = value[match(max(n), n)]) %>%
select(-n) %>%
ungroup()
name value grouping
<int> <chr> <chr>
1 1 Apple Pie A141
2 2 Apple Pie A141
3 3 Apple Pie A141
4 4 Apple Pie A141
5 5 Pear P600