R从列中拆分值并将值类保留为因子而不是列表

时间:2018-03-28 14:45:59

标签: r string data-manipulation

我知道类似的问题已被问过很多,但我找不到能满足我问题的问题。

这是我的问题。 我有一个如下所示的数据框:

Sample        Condition
RN001         1_healthy
RN002         14_healthy
RN008         20_disease
RN009         21_disease
RN0010        10_healthy

我需要的是从列条件中拆分值来获取此信息:

Sample        Condition
RN001         healthy
RN002         healthy
RN008         disease
RN009         disease
RN0010        healthy

我已经尝试过了:

data$Condition <- lapply(strsplit(as.character(data$Condition), "_"), '[', 2)

但我获得了这样的列表数据结构:

[[1]]
[1] "healthy"

[[2]]
[1] "healthy"

[[3]]
[1] "disease"

[[4]]
[1] "disease"

我需要的是具有类因子的数据结构,如下所示:

 [1] healthy healthy disease disease healthy ...
 2 Levels:  healthy disease

感谢您的评论。

1 个答案:

答案 0 :(得分:2)

我们使用sub删除前缀部分,方法是匹配一个或多个数字(\\d+)从开头(^)后跟下划线(_)并替换它带有空白(""

data$Condition <- sub("^\\d+_", "", data$Condition)
data$Condition
#[1] "healthy" "healthy" "disease" "disease" "healthy"

lapply的输出始终为list。因此,如果我们需要vector,请使用sapply

data$Condition <- sapply(strsplit(as.character(data$Condition), "_"), '[', 2)

unlist来自list

lapply输出
data$Condition <- unlist(lapply(strsplit(as.character(data$Condition), "_"), '[', 2))