Question

（过去两天我一直坚持这个问题，所以如果有答案的话，请耐心等待。）

我有两个数据框A和B.我想在Name列上合并它们。假设，A有两列Name和Numbers。 A df的Name列的值为“.tony.x.rds”，“。tom.x.rds”等等。

Name     Numbers
.tony.x.rds 15.6
.tom.x.rds 14.5

B df有两列Name和ChaR。 B的Name列的值为“tony.x”，“tom.x”等。

Name  ChaR
tony.x   ENG
tom.x   US

dfs列名中的主要元素是“tony”，“tom”等等。

所以，“。tony.x.rds”等于“tony.x”和“。tom.x.rds”等于“tom.x”。

我尝试了各种选项的gsub，在“A”和“B”数据框的列名中依次使用“tony”，“tom”等等。但是当我使用

时

StoRe<-merge(A,B, all=T)

我是A和B的所有行，而不是单行。也就是说，每个“a”，“b”等都有两行，以及它们在Numbers和ChaR列中的相应值。例如：

Name Numbers ChaR
tony    15.6    NA
tony    NULL    ENG
tom    14.5    NA
tom    NULL    US

一直让我头痛欲裂。我请你帮忙。

Answer 1

一种可能的解决方案。我不完全确定你想要对字符串中的'x'做什么，我将它们保存在链接键中，但是通过将\\1\\2更改为\\1，您只保留第一个字母。

a <- data.frame(
  Name = paste0(".", c("tony", "tom", "foo", "bar", "foobar"), ".x.rds"),
  Numbers = rnorm(5)
)

b <- data.frame(
  Name = paste0(c("tony", "tom", "bar", "foobar", "company"), ".x"),
  ChaR = LETTERS[11:15]
)

# String consists of 'point letter1 point letter2 point rds'; replace by
# 'letter1 letter2' 
a$Name_stand <- gsub("^\\.([a-z]+)\\.([a-z]+)\\.rds$", "\\1\\2", a$Name)

# String consists of 'letter1 point letter2'; replace by 'letter1 letter2' 
b$Name_stand <- gsub("^([a-z]+)\\.([a-z]+)$", "\\1\\2", b$Name)

result <- merge(a, b, all = TRUE, by = "Name_stand")

输出：

#> result
#  Name_stand        Name.x     Numbers    Name.y ChaR
#1       barx    .bar.x.rds  1.38072696     bar.x    M
#2   companyx          <NA>          NA company.x    O
#3    foobarx .foobar.x.rds -1.53076596  foobar.x    N
#4       foox    .foo.x.rds  1.40829287      <NA> <NA>
#5       tomx    .tom.x.rds -0.01204651     tom.x    L
#6      tonyx   .tony.x.rds  0.34159406    tony.x    K

另一个，也许有些更强大（对于字符串的变体，例如'tom.rds'和'tom'仍将被链接;这当然也是一个缺点）/

# Remove the rds from a$Name
a$Name_stand <- gsub("rds$" , "", a$Name)
# Remove all non alpha numeric characters from the strings
a$Name_stand <- gsub("[^[:alnum:]]", "", a$Name_stand)
b$Name_stand <- gsub("[^[:alnum:]]", "", b$Name)

result2 <- merge(a, b, all = TRUE, by = "Name_stand")

将两个数据帧与字符串中具有特定模式的列合并

1 个答案: