“ ifelse grepl”循环的更紧凑版本

时间:2018-09-24 12:54:43

标签: r

我有一个data.frame多列,我想在df的末尾添加另一列,其中包含基于另一列中存在的特定字符串。

例如,我有:

df <- data.frame(
  "Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
  "Compound" = c("XXX", "YYY", "KKK"))

我正在使用以下语句根据“治疗用途”一栏中的内容提取信息。

df$Target.Organ <- NA
df$Target.Organ <- ifelse(
  grepl("Epilepsy", df$Therapeutic.Use), "Brain",
    ifelse(grepl("Cancer", df$Therapeutic.Use), "Cancer",
      ifelse(grepl("Angina", df$Therapeutic.Use), "Heart", "Other")))

以此类推。我有一个具有500种不同用途的表,因此避免编写500条ifelse语句对我来说会更容易。这可能吗?

预先感谢您的帮助:)

4 个答案:

答案 0 :(得分:3)

创建一个指向left_join的查找表。 请记住使用stringsAsFactors = FALSE,否则您将获得因子水平而不是字符...

df <- data.frame("Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"), 
                 "Compound" = c("XXX", "YYY", "KKK"),
                 stringsAsFactors = FALSE)


library ( dplyr )
#create lookup-table (or read in from a csv/excel)
lookup <- data.frame( Therapeutic.Use = unique( df$Therapeutic.Use ),
                      Target.organ = c("Brain", "Cancer", "Heart" ),
                      stringsAsFactors = FALSE )

df %>% 
  #perform left join
  left_join( lookup ) %>% 
  #replace NA in Target.organ with "Other"
  mutate( Target.organ = ifelse( is.na( Target.organ ), "Other", Target.organ ) )


#   Therapeutic.Use Compound Target.organ
# 1        Epilepsy      XXX        Brain
# 2          Cancer      YYY       Cancer
# 3          Angina      KKK        Heart                                   

答案 1 :(得分:0)

尝试case_when

df%>%
   mutate(Target.Organ=case_when(
     grepl("Epilepsy", Therapeutic.Use)~"Brain",
     grepl("Cancer", Therapeutic.Use)~ "Cancer",
     grepl("Angina", Therapeutic.Use)~ "Heart",
     T~"Other"  
   ))
  Therapeutic.Use Compound Target.Organ
1        Epilepsy      XXX        Brain
2          Cancer      YYY       Cancer
3          Angina      KKK        Heart

答案 2 :(得分:0)

只需创建一个包含器官和治疗方法的数据框,然后创建一个left_join。

# Your dataframe
df <- data.frame("Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"), "Compound" = c("XXX", "YYY", "KKK"))

# Organ dataframe
df2 <- data.frame(Therapeutic.Use = c("Epilepsy", "Cancer", "Angina"),
                  organ = c("Brain", "Cancer", "Heart"))

# Joining dataframes
library(dplyr)
df_done <- left_join(df, df2)

> df_done
  Therapeutic.Use Compound  organ
1        Epilepsy      XXX  Brain
2          Cancer      YYY Cancer
3          Angina      KKK  Heart

答案 3 :(得分:0)

在键值对的查找data.table()上,依次使用data.table::setkey()dplyr::mutate()df上添加新列:

library(data.table)
library(dplyr)

df <- data.frame(
  "Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
  "Compound" = c("XXX", "YYY", "KKK"),
  stringsAsFactors = F)

hash <- data.table(
  "Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
  "Organ" = c("Brain", "Cancer", "Heart"))
setkey(hash, Therapeutic.Use)

df2 <-  mutate(df, Organ = hash[df$Therapeutic.Use]$Organ)

if(any(is.na(df2$Organ)))
  df2[is.na(df2$Organ), ]$Organ <- "Other"

df2

#   Therapeutic.Use Compound  Organ
# 1        Epilepsy      XXX  Brain
# 2          Cancer      YYY Cancer
# 3          Angina      KKK  Heart