我的数据格式如下:
dataset <- data.frame(taxa = c("taxa1", "taxa2", "taxa3"),
"11908.MM.0008.Inf.6m.Stool" =c(0,1760,0),
"11908.MM.01115.Inf.6m.Stool" =c(0,1517,0),
"11908.MM.0044.Inf.6m.Stool" =c(0,10815,0),
"11908.MM.0125.Mom.6m.Stool" = c(0,4719,0))
view(dataset)
我想将其转换为以下格式:
fix_dataset <- data.frame(study_id = c(0008, 0115, 0044, 0125),
individual = c("Inf", "Inf", "Inf", "Mom" ),
taxa1 = c(0,0,0,0),
taxa2 = c(1760, 1517,10815, 4719),
taxa3 = c(0,0,0,0),
timept1 = c("6m", "6m", "6m", "6m"))
view(fix_dataset)
我试图从每个列名中切出开头的数字序列11908和“ Stool”,将列名的其他部分切开,然后从宽格式转换为长格式。
答案 0 :(得分:0)
您可以使用以下代码来实现:
library(tidyverse)
dataset %>%
pivot_longer(cols = -taxa) %>%
separate(col = name, into = c("info1", "info2", "study_id", "individual", "timept1", "info3"), sep = "[.]") %>%
pivot_wider(names_from = taxa,
values_from = value) %>%
select(study_id, individual, starts_with("taxa"), timept1)
给出:
# A tibble: 4 x 6
study_id individual taxa1 taxa2 taxa3 timept1
<chr> <chr> <dbl> <dbl> <dbl> <chr>
1 0008 Inf 0 1760 0 6m
2 01115 Inf 0 1517 0 6m
3 0044 Inf 0 10815 0 6m
4 0125 Mom 0 4719 0 6m
请注意,您的研究编号存在一些不一致,即原始数据集中的编号之一是“ 01115”,而在您的首选输出中则是“ 0115”。