Question

我的数据集中有大量以“dis ....”开头的列。

列中的值为0（无疾病）或1（有疾病）。我想创建一个观察数据框，其中1表示特定疾病，0表示其他所有疾病。

我尝试了以下内容：

istroke <- filter(onlyCRP, dis_ep0009 == 1 & grep("dis_" == 0))

并与select：

结合使用

istroke1 <- filter(onlyCRP, dis_ep0009 == 1 & select(contains("dis_") == 0))

正如你猜测的那样，它们都不起作用。

我看过这些帖子：

filtering columns by regex in dataframe

Subset data based on partial match of column names

但他们没有回答我的问题。

如果您需要进一步说明，请与我们联系。

修改我意识到我需要进一步澄清我想要的东西。考虑一下这个表：

dis_ep0009  dis_epxxx   dis_epxxx
 0            0             0
 0            1             0  
 0            0             1
 1            0             1
 0            0             0
 0            0             0
 1            1             1

我需要另一个列，例如 - 根据这3列的某些条件（我实际上有29个“dis_”列）：

如果dis_ep0009 == 1，那么IS == 1（无论其他任何“dis ..”列上的0或1）。
如果dis_ep0009 == 0且dis_epxxx == 1，我想放弃这些观察
如果dis_ep0009 == 0且dis_epxxx == 0，我想编码IS == 0。

所以结果表应如下所示：

dis_ep0009  dis_epxxx   dis_epxxx    IS
 0            0             0        0
 0            1             0        drop
 0            0             1        drop
 1            0             1        1
 0            0             0        0
 0            0             0        0
 1            1             1        1

我已经尝试将过滤器（dplyr）与grep和ifelse语句配对，但不能使它的头部或尾部。从本质上讲，它应该像这样简单（不适用于工作）：

istroke <- filter(df, ifelse(dis_ep0009 == 1, 1, ifelse(dis_ep0009 == 0 & grep("dis_", names(df)) == 0, 0, ifelse(dis_ep0009 == 0 & grep("dis_", names(df)) == 1, drop())))

提前致谢！

Answer 1

在代码中查看评论，并告诉我这是否是您想要的

specific_disease <- "dis_ep0009"
disease_cols <- grep("dis",names(onlyCRP),value=TRUE) # all columns containing "dis"
disease_cols <- setdiff(disease_cols,specific_disease) # all these columns except your specific disease
onlyCRP$any_other_disease <- apply(onlyCRP[,disease_cols]==1,1,any) # a Boolean column saying if there is another disease besides the possible specific one
onlyCRP[onlyCRP$specific_disease == 1 & !onlyCRP$any_other_disease,] # the subset where you'll have only your specific disease and no other

如何根据列名称

1 个答案: