我有一个data.frame多列,我想在df的末尾添加另一列,其中包含基于另一列中存在的特定字符串。
例如,我有:
df <- data.frame(
"Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
"Compound" = c("XXX", "YYY", "KKK"))
我正在使用以下语句根据“治疗用途”一栏中的内容提取信息。
df$Target.Organ <- NA
df$Target.Organ <- ifelse(
grepl("Epilepsy", df$Therapeutic.Use), "Brain",
ifelse(grepl("Cancer", df$Therapeutic.Use), "Cancer",
ifelse(grepl("Angina", df$Therapeutic.Use), "Heart", "Other")))
以此类推。我有一个具有500种不同用途的表,因此避免编写500条ifelse
语句对我来说会更容易。这可能吗?
预先感谢您的帮助:)
答案 0 :(得分:3)
创建一个指向left_join的查找表。
请记住使用stringsAsFactors = FALSE
,否则您将获得因子水平而不是字符...
df <- data.frame("Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
"Compound" = c("XXX", "YYY", "KKK"),
stringsAsFactors = FALSE)
library ( dplyr )
#create lookup-table (or read in from a csv/excel)
lookup <- data.frame( Therapeutic.Use = unique( df$Therapeutic.Use ),
Target.organ = c("Brain", "Cancer", "Heart" ),
stringsAsFactors = FALSE )
df %>%
#perform left join
left_join( lookup ) %>%
#replace NA in Target.organ with "Other"
mutate( Target.organ = ifelse( is.na( Target.organ ), "Other", Target.organ ) )
# Therapeutic.Use Compound Target.organ
# 1 Epilepsy XXX Brain
# 2 Cancer YYY Cancer
# 3 Angina KKK Heart
答案 1 :(得分:0)
尝试case_when
:
df%>%
mutate(Target.Organ=case_when(
grepl("Epilepsy", Therapeutic.Use)~"Brain",
grepl("Cancer", Therapeutic.Use)~ "Cancer",
grepl("Angina", Therapeutic.Use)~ "Heart",
T~"Other"
))
Therapeutic.Use Compound Target.Organ
1 Epilepsy XXX Brain
2 Cancer YYY Cancer
3 Angina KKK Heart
答案 2 :(得分:0)
只需创建一个包含器官和治疗方法的数据框,然后创建一个left_join。
# Your dataframe
df <- data.frame("Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"), "Compound" = c("XXX", "YYY", "KKK"))
# Organ dataframe
df2 <- data.frame(Therapeutic.Use = c("Epilepsy", "Cancer", "Angina"),
organ = c("Brain", "Cancer", "Heart"))
# Joining dataframes
library(dplyr)
df_done <- left_join(df, df2)
> df_done
Therapeutic.Use Compound organ
1 Epilepsy XXX Brain
2 Cancer YYY Cancer
3 Angina KKK Heart
答案 3 :(得分:0)
在键值对的查找data.table()
上,依次使用data.table::setkey()
和dplyr::mutate()
在df
上添加新列:
library(data.table)
library(dplyr)
df <- data.frame(
"Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
"Compound" = c("XXX", "YYY", "KKK"),
stringsAsFactors = F)
hash <- data.table(
"Therapeutic.Use" = c("Epilepsy", "Cancer", "Angina"),
"Organ" = c("Brain", "Cancer", "Heart"))
setkey(hash, Therapeutic.Use)
df2 <- mutate(df, Organ = hash[df$Therapeutic.Use]$Organ)
if(any(is.na(df2$Organ)))
df2[is.na(df2$Organ), ]$Organ <- "Other"
df2
# Therapeutic.Use Compound Organ
# 1 Epilepsy XXX Brain
# 2 Cancer YYY Cancer
# 3 Angina KKK Heart