Question

我有2个数据框。

在df1中，我有一列国际疾病分类（ICD）诊断代码（df1$PriDiag），以及其他信息。

#df1
PriDiag = c("A051","A067","A161","A242","A459") 
Admissions = c("106","79","67","50","41") 
Pts = c("97","27","45","30","20") 
df1 = data.frame(PriDiag,Admissions,Pts) 
df1
  PriDiag Admissions Pts
1    A051        106  97
2    A067         79  27
3    A161         67  45
4    A242         50  30
5    A459         41  20

在其他数据框（df2）中，我有ICD子类别的开始（df2$Start）和结束（df2$End）限制，以及相关说明（{{ 1}}）。

df2$Description

我想要做的是为#df2 Start = c("A00","A15","A20","A30") End = c("A09","A19","A28","A49") Description = c("Intestinal infectious diseases","Tuberculosis","Certain zoonotic bacterial","Other bacterial diseases") df2 = data.frame(Start,End,Description) df2 Start End Description 1 A00 A09 Intestinal infectious diseases 2 A15 A19 Tuberculosis 3 A20 A28 Certain zoonotic bacterial diseases 4 A30 A49 Other bacterial diseases分配一个新列，其中包含代码（df1）的子类别说明（df2$Description）。如果代码是数字而不是字符，我将能够做到这一点，但我正在努力找到一个快速的解决方案。有没有在字符之间搜索的方法？

我想要的结果是一个新的数据框df1$PriDiag，看起来像这样：

df3

我该怎么做？

Answer 1

试试这个：

library(sqldf)

sqldf("select df1.*, df2.Description 
       from df1 
       left join df2
       on PriDiag between Start and End"
)

，并提供：

  PriDiag Admissions Pts                    Description
1    A051        106  97 Intestinal infectious diseases
2    A067         79  27 Intestinal infectious diseases
3    A161         67  45                   Tuberculosis
4    A242         50  30     Certain zoonotic bacterial
5    A459         41  20       Other bacterial diseases

Answer 2

这会对您的数据做出一些可能不正确的假设。如果您的数据不像看起来那么直接，可以进行调整，但阻力最小的路径是我最喜欢的。

library(qdap)

## Create a list key based on ranges
key <- setNames(lapply(1:nrow(df2), function(i) {
    paste0(strtrim(df2[i, 1], 1), 
        pad(substring(df2[i, 1], 2):substring(df2[i, 2], 2), 2))
}), df2[, 3])

## Assuming that last digit isn't important use qdap's lookup function (%l%)
df1[, "Description"] <- strtrim(df1[, 1], 3) %l% key

##   PriDiag Admissions Pts                    Description
## 1    A051        106  97 Intestinal infectious diseases
## 2    A067         79  27 Intestinal infectious diseases
## 3    A161         67  45                   Tuberculosis
## 4    A242         50  30     Certain zoonotic bacterial
## 5    A459         41  20       Other bacterial diseases

搜索和处理范围内的字符

2 个答案: