我有以下数据框:
df <- structure(list(gene_id = c("RNA18S5", "RNA18S5", "RNA18S5", "RNA18S5",
"RNA18S5"), samplename = c("XX_135_S14.Adipose", "XX_133_S12.Adipose",
"XX_128_S7.Umbilical", "XX_117_S11.Liver", "XX_124_S3.Pulmonary"
), gene_expr = c(6533029L, 5494889L, 5491158L, 5232914L, 5151004L
)), .Names = c("gene_id", "samplename", "gene_expr"), row.names = c(NA,
-5L), class = c("tbl_df", "tbl", "data.frame"))
df
#> gene_id samplename gene_expr
#> 1 RNA18S5 XX_135_S14.Adipose 6533029
#> 2 RNA18S5 XX_133_S12.Adipose 5494889
#> 3 RNA18S5 XX_128_S7.Umbilical 5491158
#> 4 RNA18S5 XX_117_S11.Liver 5232914
#> 5 RNA18S5 XX_124_S3.Pulmonary 5151004
我想要做的是拆分samplename
并创建新列。
我试过了:
library(tidyverse)
df <- df %>%
mutate(subtype=stringr::str_split(samplename,"\\.")[[1]][2])
df
这给出了这个:
# A tibble: 5 x 4
gene_id samplename gene_expr subtype
<chr> <chr> <int> <chr>
1 RNA18S5 XX_135_S14.Adipose 6533029 Adipose
2 RNA18S5 XX_133_S12.Adipose 5494889 Adipose
3 RNA18S5 XX_128_S7.Umbilical 5491158 Adipose
4 RNA18S5 XX_117_S11.Liver 5232914 Adipose
5 RNA18S5 XX_124_S3.Pulmonary 5151004 Adipose
请注意,子类型列不正确。我希望输出为:
gene_id samplename gene_expr subtype
1 RNA18S5 XX_135_S14.Adipose 6533029 Adipose
2 RNA18S5 XX_133_S12.Adipose 5494889 Adipose
3 RNA18S5 XX_128_S7.Umbilical 5491158 Umbilical
4 RNA18S5 XX_117_S11.Liver 5232914 Liver
5 RNA18S5 XX_124_S3.Pulmonary 5151004 Pulmonary
做正确的方法是什么?
答案 0 :(得分:2)
以下是extract
library(tidyverse)
df %>%
extract(samplename, into = 'subtype', '.*\\.([^.]+)', remove = FALSE)
# A tibble: 5 x 4
# gene_id samplename subtype gene_expr
#* <chr> <chr> <chr> <int>
#1 RNA18S5 XX_135_S14.Adipose Adipose 6533029
#2 RNA18S5 XX_133_S12.Adipose Adipose 5494889
#3 RNA18S5 XX_128_S7.Umbilical Umbilical 5491158
#4 RNA18S5 XX_117_S11.Liver Liver 5232914
#5 RNA18S5 XX_124_S3.Pulmonary Pulmonary 5151004