假设我有一个数据“ A”,例如:
Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness
和另一个数据集B相似:
parts key
Chest pneumonia
Head headache
Abdominal spinal
Abdominal intervetebral
Abdominal colon
Abdominal ureter
Abdominal colon
Head concussion
Neck thyroid
Chest breast
Abdominal liver
Abdominal hepatitis
Head giddiness
我想从B$key
中找到A&disease_name
的单词,并用那些匹配的关键字将A合并到B,以便将B$parts
分配给A&disease_name
。
如何在R中做到这一点?
答案 0 :(得分:1)
欢迎您!这个问题对我来说很清楚。这是一个tidyverse
解决方案。
首先读取一些数据:
library(dplyr)
tmp <- data.table::fread(
"Disease_name
(J189)Pneumonia, unspecified
(R51)Headache
(M4806)Spinal stenosis, lumbar region
(M512)Other specified intervertebral disc displacement
(C187)Sigmoid colon
(N201)Calculus of ureter
(C189)Colon, unspecified
(S0600)Concussion, without open intracranial wound
(C73)Malignant neoplasm of thyroid gland
(C509)Breast, unspecified
(K746)Other and unspecified cirrhosis of liver
(B181)Chronic viral hepatitis B without delta- agent
(R42)Dizziness and giddiness",
sep = ""
)
tmp2 <- data.table::fread(
"parts key
Chest pneumonia
Head headache
Abdominal spinal
Abdominal intervertebral
Abdominal colon
Abdominal ureter
Abdominal colon
Head concussion
Neck thyroid
Chest breast
Abdominal liver
Abdominal hepatitis
Head giddiness"
)
然后我们进行联接:
result <-
tmp %>%
mutate(key = gsub(paste0(".*(", paste(tmp2$key, collapse = "|"), ").*"),
"\\1",
tolower(tmp$Disease_name))) %>%
left_join(tmp2)
#> Joining, by = "key"
结果:
result
#> Disease_name key
#> 1 (J189)Pneumonia, unspecified pneumonia
#> 2 (R51)Headache headache
#> 3 (M4806)Spinal stenosis, lumbar region spinal
#> 4 (M512)Other specified intervertebral disc displacement intervertebral
#> 5 (C187)Sigmoid colon colon
#> 6 (C187)Sigmoid colon colon
#> 7 (N201)Calculus of ureter ureter
#> 8 (C189)Colon, unspecified colon
#> 9 (C189)Colon, unspecified colon
#> 10 (S0600)Concussion, without open intracranial wound concussion
#> 11 (C73)Malignant neoplasm of thyroid gland thyroid
#> 12 (C509)Breast, unspecified breast
#> 13 (K746)Other and unspecified cirrhosis of liver liver
#> 14 (B181)Chronic viral hepatitis B without delta- agent hepatitis
#> 15 (R42)Dizziness and giddiness giddiness
#> parts
#> 1 Chest
#> 2 Head
#> 3 Abdominal
#> 4 Abdominal
#> 5 Abdominal
#> 6 Abdominal
#> 7 Abdominal
#> 8 Abdominal
#> 9 Abdominal
#> 10 Head
#> 11 Neck
#> 12 Chest
#> 13 Abdominal
#> 14 Abdominal
#> 15 Head
由reprex package(v0.2.1)于2018-09-28创建
答案 1 :(得分:1)
假设您有两个数据帧A
和B
,则可以使用sqldf
将两者合并成一个新的数据帧C
,如下所示:
C = sqldf("SELECT B.parts, A.Disease_name
FROM A, B
WHERE LOCATE(B.key, A.Disease_name) > 0")
我目前无法访问有效的R/sqldf
环境,因此您可能仍需要对SQL语句进行一些调整。