如何在R中拆分列名称并删除名称的一部分并将数据从宽格式转换为长格式

时间:2020-11-11 13:33:57

标签: r reshape transpose strsplit delimiter-separated-values

我的数据格式如下:

dataset <- data.frame(taxa = c("taxa1", "taxa2", "taxa3"),
                      "11908.MM.0008.Inf.6m.Stool" =c(0,1760,0),
                      "11908.MM.01115.Inf.6m.Stool" =c(0,1517,0),
                      "11908.MM.0044.Inf.6m.Stool" =c(0,10815,0),
                      "11908.MM.0125.Mom.6m.Stool" = c(0,4719,0))
view(dataset)

我想将其转换为以下格式:

fix_dataset <- data.frame(study_id = c(0008, 0115, 0044, 0125),
individual = c("Inf", "Inf", "Inf", "Mom" ),
taxa1 = c(0,0,0,0),
taxa2 = c(1760, 1517,10815, 4719),
taxa3 = c(0,0,0,0),
timept1 = c("6m", "6m", "6m", "6m"))

view(fix_dataset)

我试图从每个列名中切出开头的数字序列11908和“ Stool”,将列名的其他部分切开,然后从宽格式转换为长格式。

1 个答案:

答案 0 :(得分:0)

您可以使用以下代码来实现:

library(tidyverse)
dataset %>%
  pivot_longer(cols = -taxa) %>%
  separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
  pivot_wider(names_from = taxa,
              values_from = value) %>%
  select(study_id, individual, starts_with("taxa"), timept1)

给出:

# A tibble: 4 x 6
  study_id individual taxa1 taxa2 taxa3 timept1
  <chr>    <chr>      <dbl> <dbl> <dbl> <chr>  
1 0008     Inf            0  1760     0 6m     
2 01115    Inf            0  1517     0 6m     
3 0044     Inf            0 10815     0 6m     
4 0125     Mom            0  4719     0 6m 

请注意,您的研究编号存在一些不一致,即原始数据集中的编号之一是“ 01115”,而在您的首选输出中则是“ 0115”。