使用R或Python将数据帧的列中的字符串与另一个数据帧的列中的字符串进行匹配

时间:2018-01-31 06:45:47

标签: python r string string-matching

我正在尝试将数据框列中的字符串与另一个数据框的列中的字符串进行匹配,并映射相应的值。两个数据帧的行数不同

df1 = data.frame(name = c("(CKMB)Creatinine Kinase Muscle & Brain", "24 Hours Urine for Sodium", "Antistreptolysin O Titer", "Blood group O", lonic_code = c("27816-8-O", "27816-8-B", "1869-7", "33914-3")
df2 = data.frame(Testcomponents = c("creatinine", "blood", "potassium"))

预期输出

Test Components          lonic_code
creatinine                27816-8-O
 blood                      1869-7
potassium                    NA

4 个答案:

答案 0 :(得分:2)

这是一个可能的解决方案。可能不是最美丽的一个,很想看到其他解决方案。

df1 = data.frame(name = c("(CKMB)Creatinine Kinase Muscle & Brain", "24 Hours Urine for Sodium", "Antistreptolysin O Titer", "Blood group O"), lonic_code = c("27816-8-O", "27816-8-B", "1869-7", "33914-3"))
df2 = data.frame(Testcomponents = c("creatinine", "blood", "potassium"))

result = lapply(sapply(df2$Testcomponents,function(x) {
  which(sapply(df1$name,function(y) {grepl(x,y,ignore.case = T)}))}),function(z) {df1$lonic_code[z]})

df2$Ionic_code= result

输出:

  Testcomponents Ionic_code
1     creatinine          3
2          blood          4
3      potassium           

答案 1 :(得分:2)

在这种情况下,

regex_right_join可能很方便。

library(fuzzyjoin)
library(dplyr)

df1 %>%
  mutate(name = as.character(name)) %>%
  regex_right_join(df2 %>%
                     mutate(Testcomponents = as.character(Testcomponents)), 
                   by = c(name = "Testcomponents"), ignore_case = T) %>%
  select(Testcomponents, lonic_code)

输出是:

  Testcomponents lonic_code
1     creatinine  27816-8-O
2          blood    33914-3
3      potassium       <NA>

示例数据:

df1 <- structure(list(name = structure(1:4, .Label = c("(CKMB)Creatinine Kinase Muscle & Brain", 
"24 Hours Urine for Sodium", "Antistreptolysin O Titer", "Blood group O"
), class = "factor"), lonic_code = structure(c(3L, 2L, 1L, 4L
), .Label = c("1869-7", "27816-8-B", "27816-8-O", "33914-3"), class = "factor")), .Names = c("name", 
"lonic_code"), row.names = c(NA, -4L), class = "data.frame")

df2 <- structure(list(Testcomponents = structure(c(2L, 1L, 3L), .Label = c("blood", 
"creatinine", "potassium"), class = "factor")), .Names = "Testcomponents", row.names = c(NA, 
-3L), class = "data.frame")

答案 2 :(得分:1)

这比弗洛里安的答案要多一些,但是,我认为通过更容易阅读来弥补它:

import { DataService } from './data.service';
import { Injectable } from '@angular/core';



@Injectable()
export class AuthService{

private url = 'http://localhost/appointjobs/index.php/admin_api/index';
constructor(private dataService:DataService) {
}

signIn(params:HTMLInputElement){
  this.dataService.getWhere(this.url,params)
  .subscribe(response=>{
    console.log(response);
  });
 }
}

答案 3 :(得分:1)

您可以使用sapply遍历Testcomponents:

df2$lonic_code <- sapply(tolower(df2$Testcomponents), function(x) 
                     df1$lonic_code[grep(x, tolower(df1$name), fixed = TRUE)[1L]])

df2
#  Testcomponents lonic_code
#1     creatinine  27816-8-O
#2          blood    33914-3
#3      potassium       <NA>

如果是多场比赛,这将始终只返回第一场比赛。

这应该相当快,因为​​它只使用一个循环,因为我们在fixed = TRUE中指定了grep。为了进一步提高速度,您可以使用stringi包的正则表达式函数。