R中的多个模式/字符串匹配

时间:2018-07-03 21:21:54

标签: r dataframe

我有两个数据框,一个是具有2万多种可能性的地图,另一个是包含3列的30000行数据。我需要使用地图找出正确的名称。这是我需要的一个简单示例:

例如,

data <- data.frame(
  V1 = c('baa','bb','aa','cc','dd','ee','caa'),
  V2 = c('ff','gg','hh','yy','jj','kk','hh')
)
# V1 V2
# baa ff
# bb gg
# aa hh
# cc yy
# dd jj
# ee kk
# caa hh

map <- data.frame(
  V1 = c('aa','gg','cc','jj','kk'), 
  V2  = c(1:5)
) 
# V1 V2 
# aa 1
# gg 2
# cc 3
# jj 4
# kk 5

>what.I.need
V1 V2 V3
baa ff 1
bb gg 2
aa hh 1
cc yy 3
dd jj 4
ee kk 5
caa hh 1

我尝试使用grep,但似乎无法弄清楚如何使它与20000种可能性的映射一起使用,并使其填充在“ what.I.need”的第三列中。预先谢谢你。

3 个答案:

答案 0 :(得分:1)

path = new fabric.Path('M0,0L-10,-10')
path.set({
  top: ???,
  left: ???
});

我觉得它会比这更简洁。 :)

答案 1 :(得分:0)

library(dplyr)
library(tidyr)

df1 <- data.frame(V1 = c("aa", "bb", "aa", "cc", "dd", "ee", "aa"), V2 = c("ff", "gg", "hh", "yy", "jj", "kk", "hh"), stringsAsFactors = FALSE)
df2 <- data.frame(V1 = c("aa", "gg", "cc", "jj", "kk"), V2 = c(1,2,3,4,5), stringsAsFactors = FALSE)

left_join(df1, df2, by = c("V2" = "V1")) %>% 
left_join(., df2, by = "V1") %>% 
  mutate(V3 = ifelse(is.na(V2.y), V2.y.y, V2.y)) %>% 
  select(-V2.y, -V2.y.y)

这将创建此表,然后删除V2.yV2.y.y

  V1 V2.x V2.y V2.y.y V3
1 aa   ff   NA      1  1
2 bb   gg    2     NA  2
3 aa   hh   NA      1  1
4 cc   yy   NA      3  3
5 dd   jj    4     NA  4
6 ee   kk    5     NA  5
7 aa   hh   NA      1  1

哪个给你的?

  V1 V2.x V3
1 aa   ff  1
2 bb   gg  2
3 aa   hh  1
4 cc   yy  3
5 dd   jj  4
6 ee   kk  5
7 aa   hh  1

答案 2 :(得分:0)

您可以尝试以下方法:

data <- data.frame(
  V1 = c('aa','bb','aa','cc','dd','ee','aa'),
  V2 = c('ff','gg','hh','yy','jj','kk','hh'), stringsAsFactors = F
)

map <- data.frame(
  V1 = c('aa','gg','cc','jj','kk'), 
  V2  = c(1:5), stringsAsFactors = F
)

data$V3.1 <- map$V2[match(data$V1, map$V1)]
data$V3.2 <- map$V2[match(data$V2,map$V1)]
data$V3 <- ifelse(!is.na(data$V3.1), data$V3.1, data$V3.2)
data
# V1 V2 V3.1 V3.2 V3
# 1 aa ff    1   NA  1
# 2 bb gg   NA    2  2
# 3 aa hh    1   NA  1
# 4 cc yy    3   NA  3
# 5 dd jj   NA    4  4
# 6 ee kk   NA    5  5
# 7 aa hh    1   NA  1