结合小数值和不精确的值

时间:2019-12-06 09:45:49

标签: r

我有两个小标题,我想根据“蝙蝠侠”专栏将它们合并。但是,两列中的值并不完全相同,即“ V Kohli”与“ Virat Kohli(IND)”。如何根据这些不完全匹配的内容来组合小标题?

谢谢!

x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
                       Runs=c(500,400,300,200),
                       Matches=c(67,54,47,23)  

x2 <- tibble(Rank=c(1,2,3,4),
             Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
             Rating=c(853,820,640,500))

1 个答案:

答案 0 :(得分:0)

因此,您想连接两个文本字符串,

> x1$Batsman
[1] "V Kohli (INDIA)"   "RG Sharma (INDIA)" "Babar Azam (PAK)"  "GJ Maxwell (AUS)" 
> x2$Batsman
[1] "Virat Kohli"   "Rohit Sharma"  "Glenn Maxwell" "Babar Azam"  

我想您的名字比这四个要多得多? 这绝对是一项棘手的任务,众所周知,计算机在执行此类任务时表现很差。 (有一些著名的例子,仅读取电话号码就具有很长的功能)。从您提供的字符串中,我可以看到它们始终具有相似的名称。

我会用stringr用正则表达式提取名称。

完整代码:

library(tibble)
library(stringr)

x1 <- tibble(Batsman=c("V Kohli (INDIA)","RG Sharma (INDIA)","Babar Azam (PAK)","GJ Maxwell (AUS)"),
             Runs=c(500,400,300,200),
             Matches=c(67,54,47,23) )

x2 <- tibble(Rank=c(1,2,3,4),
            Batsman=c("Virat Kohli", "Rohit Sharma", "Glenn Maxwell","Babar Azam"),
            Rating=c(853,820,640,500))


AA <- str_sub(x1$Batsman, start = str_locate(x1$Batsman, " ")[,1]+1, 20)
AA <- str_sub(AA, start = 1, end = str_locate(AA, " ")[,1]-1)  %>%
  str_to_lower()


BB <- str_sub(x2$Batsman, start = str_locate(x2$Batsman, " ")[,1]+1, 20) %>%
  str_to_lower()

match(AA, BB)