搜索和处理范围内的字符

时间:2014-04-04 15:47:22

标签: r

我有2个数据框。

df1中,我有一列国际疾病分类(ICD)诊断代码(df1$PriDiag),以及其他信息。

#df1
PriDiag = c("A051","A067","A161","A242","A459") 
Admissions = c("106","79","67","50","41") 
Pts = c("97","27","45","30","20") 
df1 = data.frame(PriDiag,Admissions,Pts) 
df1
  PriDiag Admissions Pts
1    A051        106  97
2    A067         79  27
3    A161         67  45
4    A242         50  30
5    A459         41  20

在其他数据框(df2)中,我有ICD子类别的开始(df2$Start)和结束(df2$End)限制,以及相关说明({{ 1}})。

df2$Description

我想要做的是为#df2 Start = c("A00","A15","A20","A30") End = c("A09","A19","A28","A49") Description = c("Intestinal infectious diseases","Tuberculosis","Certain zoonotic bacterial","Other bacterial diseases") df2 = data.frame(Start,End,Description) df2 Start End Description 1 A00 A09 Intestinal infectious diseases 2 A15 A19 Tuberculosis 3 A20 A28 Certain zoonotic bacterial diseases 4 A30 A49 Other bacterial diseases 分配一个新列,其中包含代码(df1)的子类别说明(df2$Description)。如果代码是数字而不是字符,我将能够做到这一点,但我正在努力找到一个快速的解决方案。有没有在字符之间搜索的方法?

我想要的结果是一个新的数据框df1$PriDiag,看起来像这样:

df3

我该怎么做?

2 个答案:

答案 0 :(得分:0)

试试这个:

library(sqldf)

sqldf("select df1.*, df2.Description 
       from df1 
       left join df2
       on PriDiag between Start and End"
)

,并提供:

  PriDiag Admissions Pts                    Description
1    A051        106  97 Intestinal infectious diseases
2    A067         79  27 Intestinal infectious diseases
3    A161         67  45                   Tuberculosis
4    A242         50  30     Certain zoonotic bacterial
5    A459         41  20       Other bacterial diseases

答案 1 :(得分:0)

这会对您的数据做出一些可能不正确的假设。如果您的数据不像看起来那么直接,可以进行调整,但阻力最小的路径是我最喜欢的。

library(qdap)

## Create a list key based on ranges
key <- setNames(lapply(1:nrow(df2), function(i) {
    paste0(strtrim(df2[i, 1], 1), 
        pad(substring(df2[i, 1], 2):substring(df2[i, 2], 2), 2))
}), df2[, 3])

## Assuming that last digit isn't important use qdap's lookup function (%l%)
df1[, "Description"] <- strtrim(df1[, 1], 3) %l% key

##   PriDiag Admissions Pts                    Description
## 1    A051        106  97 Intestinal infectious diseases
## 2    A067         79  27 Intestinal infectious diseases
## 3    A161         67  45                   Tuberculosis
## 4    A242         50  30     Certain zoonotic bacterial
## 5    A459         41  20       Other bacterial diseases