我有两个小标题,我想根据“蝙蝠侠”专栏将它们合并。但是,两列中的值并不完全相同,即“ V Kohli”与“ Virat Kohli(IND)”。如何根据这些不完全匹配的内容来组合小标题?
谢谢!
x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
Runs=c(500,400,300,200),
Matches=c(67,54,47,23)
x2 <- tibble(Rank=c(1,2,3,4),
Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
Rating=c(853,820,640,500))
答案 0 :(得分:0)
因此,您想连接两个文本字符串,
> x1$Batsman
[1] "V Kohli (INDIA)" "RG Sharma (INDIA)" "Babar Azam (PAK)" "GJ Maxwell (AUS)"
> x2$Batsman
[1] "Virat Kohli" "Rohit Sharma" "Glenn Maxwell" "Babar Azam"
我想您的名字比这四个要多得多? 这绝对是一项棘手的任务,众所周知,计算机在执行此类任务时表现很差。 (有一些著名的例子,仅读取电话号码就具有很长的功能)。从您提供的字符串中,我可以看到它们始终具有相似的名称。
我会用stringr用正则表达式提取名称。
完整代码:
library(tibble)
library(stringr)
x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
Runs=c(500,400,300,200),
Matches=c(67,54,47,23) )
x2 <- tibble(Rank=c(1,2,3,4),
Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
Rating=c(853,820,640,500))
AA <- str_sub(x1$Batsman, start = str_locate(x1$Batsman, " ")[,1]+1, 20)
AA <- str_sub(AA, start = 1, end = str_locate(AA, " ")[,1]-1) %>%
str_to_lower()
BB <- str_sub(x2$Batsman, start = str_locate(x2$Batsman, " ")[,1]+1, 20) %>%
str_to_lower()
match(AA, BB)