我有两个数据框。
一个包含正确和不正确的地名对:
place <- data.frame(
place_correct = c("London", "Birmingham", "Newcastle", "Brighton"),
place_incorrect = c("Lundn", "Birmgham", "Nexcassle", "Briton"), stringsAsFactors = F)
另一个包含包含以下正确和不正确的地名的列:
set.seed(123)
df <- data.frame(town = sample(c("London", "Birmingham", "Newcastle", "Brighton",
"Lundn", "Birmgham", "Nexcassle", "Briton"), 20, replace = T), stringsAsFactors = F)
我想做的是将df
中不正确的地名与place
中不正确的地名相匹配,并用正确的地名替换。
编辑:
我可以在base R
中使用ifelse
和%in%
来做到这一点:
df$town_correct <- ifelse(df$town %in% place$place_incorrect,
place$place_correct[match(df$town, place$place_incorrect)],
df$town)
df
town town_correct
1 Newcastle Newcastle
2 Nexcassle Newcastle
3 Brighton Brighton
4 Briton Brighton
5 Briton Brighton
6 London London
7 Lundn London
8 Briton Brighton
9 Lundn London
10 Brighton Brighton
11 Briton Brighton
12 Brighton Brighton
13 Birmgham Birmingham
14 Lundn London
15 London London
16 Briton Brighton
17 Birmingham Birmingham
18 London London
19 Newcastle Newcastle
20 Briton Brighton
但是如何在dplyr
中完成?
答案 0 :(得分:2)
我将使用此multisub
函数:
place <- data.frame(
place_correct = c("London", "Birmingham", "Newcastle", "Brighton"),
place_incorrect = c("Lundn", "Birmgham", "Nexcassle", "Briton"), stringsAsFactors = F)
set.seed(123)
df <- data.frame(town = sample(c("London", "Birmingham", "Newcastle", "Brighton",
"Lundn", "Birmgham", "Nexcassle", "Briton"), 20, replace = T), stringsAsFactors = F)
multisub <- function(target, output, string) {
replacement.list <- apply(cbind(target, output), 1, as.list)
mygsub <- function(l, x) gsub(pattern = l[1], replacement = l[2], x, perl=TRUE)
Reduce(mygsub, replacement.list, init = string, right = TRUE)
}
df$town_correct <- with(place, multisub(place_incorrect, place_correct, df$town))
df
#> town town_correct
#> 1 Nexcassle Newcastle
#> 2 Nexcassle Newcastle
#> 3 Newcastle Newcastle
#> 4 Birmgham Birmingham
#> 5 Newcastle Newcastle
#> 6 Birmingham Birmingham
#> 7 Birmingham Birmingham
#> 8 Birmgham Birmingham
#> 9 Newcastle Newcastle
#> 10 Lundn London
#> 11 Brighton Brighton
#> 12 Birmgham Birmingham
#> 13 Birmgham Birmingham
#> 14 London London
#> 15 Birmingham Birmingham
#> 16 Newcastle Newcastle
#> 17 Briton Brighton
#> 18 Lundn London
#> 19 Newcastle Newcastle
#> 20 Newcastle Newcastle
由reprex package(v0.3.0)于2020-05-17创建
编辑:
这可能不是最有效的解决方案,但是在检查匹配项之后,这是ifelse
的解决方案:
df$town_correct <- vapply(df$town, function(x) ifelse(x %in% place$place_incorrect,
place[match(x, place$place_incorrect, nomatch=0), "place_correct"], x),
FUN.VALUE = NA_character_, USE.NAMES = FALSE)
df
#> town town_correct
#> 1 Nexcassle Newcastle
#> 2 Nexcassle Newcastle
#> 3 Newcastle Newcastle
#> 4 Birmgham Birmingham
#> 5 Newcastle Newcastle
#> 6 Birmingham Birmingham
#> 7 Birmingham Birmingham
#> 8 Birmgham Birmingham
#> 9 Newcastle Newcastle
#> 10 Lundn London
#> 11 Brighton Brighton
#> 12 Birmgham Birmingham
#> 13 Birmgham Birmingham
#> 14 London London
#> 15 Birmingham Birmingham
#> 16 Newcastle Newcastle
#> 17 Briton Brighton
#> 18 Lundn London
#> 19 Newcastle Newcastle
#> 20 Newcastle Newcastle
答案 1 :(得分:2)
您在基数R中使用的同一ifelse()
语句也可用于dplyr:
library(dplyr)
df %>%
mutate(correct_town = if_else(town %in% place$place_incorrect,
place$place_correct[match(town, place$place_incorrect)],
town))
town correct_town
1 Nexcassle Newcastle
2 Nexcassle Newcastle
3 Newcastle Newcastle
4 Birmgham Birmingham
5 Newcastle Newcastle
6 Birmingham Birmingham
7 Birmingham Birmingham
8 Birmgham Birmingham
9 Newcastle Newcastle
10 Lundn London
11 Brighton Brighton
12 Birmgham Birmingham
13 Birmgham Birmingham
14 London London
15 Birmingham Birmingham
16 Newcastle Newcastle
17 Briton Brighton
18 Lundn London
19 Newcastle Newcastle
20 Newcastle Newcastle
或者stringr::str_replace_all()
的替代方案是:
df %>%
mutate(correct_town = stringr::str_replace_all(town, setNames(place$place_correct, place$place_incorrect)))
答案 2 :(得分:1)
在这种情况下,可以使用left_join
软件包中的dplyr
。您可以使用以下代码:
df<-left_join(df, place, by = c("town" = "place_incorrect"))
df$Town_correct<-ifelse(is.na(df$place_correct), df$town, df$place_correct)
df$place_correct<-NULL