我知道类似的问题已被问过很多,但我找不到能满足我问题的问题。
这是我的问题。 我有一个如下所示的数据框:
Sample Condition
RN001 1_healthy
RN002 14_healthy
RN008 20_disease
RN009 21_disease
RN0010 10_healthy
我需要的是从列条件中拆分值来获取此信息:
Sample Condition
RN001 healthy
RN002 healthy
RN008 disease
RN009 disease
RN0010 healthy
我已经尝试过了:
data$Condition <- lapply(strsplit(as.character(data$Condition), "_"), '[', 2)
但我获得了这样的列表数据结构:
[[1]]
[1] "healthy"
[[2]]
[1] "healthy"
[[3]]
[1] "disease"
[[4]]
[1] "disease"
我需要的是具有类因子的数据结构,如下所示:
[1] healthy healthy disease disease healthy ...
2 Levels: healthy disease
感谢您的评论。
答案 0 :(得分:2)
我们使用sub
删除前缀部分,方法是匹配一个或多个数字(\\d+
)从开头(^
)后跟下划线(_
)并替换它带有空白(""
)
data$Condition <- sub("^\\d+_", "", data$Condition)
data$Condition
#[1] "healthy" "healthy" "disease" "disease" "healthy"
lapply
的输出始终为list
。因此,如果我们需要vector
,请使用sapply
data$Condition <- sapply(strsplit(as.character(data$Condition), "_"), '[', 2)
或unlist
来自list
lapply
输出
data$Condition <- unlist(lapply(strsplit(as.character(data$Condition), "_"), '[', 2))