我不确定该如何写问题标题,所以我尽力了。我将举例说明我的数据集。我们可以将数据集称为my_data
tibble::tribble(
~Pathway, ~log_value, ~ratio, ~z_score, ~molecules,
"GHR", "N/A", "N/A", "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
"TGFB", "N/A", "N/A", "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
"PKA", "N/A", "N/A", "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
"PKB", "N/A", "N/A", "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
"PKC", "N/A", "N/A", "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
)
所以我要做的是将第1列分成一行,并将其作为每一行的名称。我也想将第5列分成多行。这就是我的设想。
GHR TGFB PKA PKB PKC
CD40LG ADAMTS8 PIK3CA MAST2 TGFBR1
TGFBR1 PIK3R1 PDGFA PIK3CA AKAP9
MYH9 HRAS PIK3R1 TGFBR1 CAMK2A
MMP1 SEM SPH BAD PHK
所以我真的不需要第2、3或4列,因此我使用my_data <- my_data[c(1,5)]
删除了它们,而我通过使用my_data$molecules <- as.character(gsub(","," ",my_data$molecules))
删除了名称之间的逗号,这给了我问题,但也许您不需要使用它。因此,我只想使第1列成为行名,然后将第5列分成多行,但是我为此很努力。有人有建议吗?提前致谢。
答案 0 :(得分:1)
您可能会使用它-
df = df[, c(1, 5)]
## Split on comma and add to dataframe
tmp = strsplit(df$molecules, ",")
df = cbind(df[, -2], do.call(rbind, tmp))
## Transpose the dataframe
df = t(df)
rownames(df) = NULL
答案 1 :(得分:1)
您的数据,已解析
df <- tibble::tribble(
~Pathway, ~log_value, ~ratio, ~z_score, ~molecules,
"GHR", "N/A", "N/A", "N/A", "CD40LG,TGFBR1,MYH9,MMP1",
"TGFB", "N/A", "N/A", "N/A", "ADAMTS8,PIK3R1,HRAS,SEM",
"PKA", "N/A", "N/A", "N/A", "PIK3CA,PDGFA,PIK3R1,SPH",
"PKB", "N/A", "N/A", "N/A", "MAST2,PIK3CA,TGFBR1,BAD",
"PKC", "N/A", "N/A", "N/A", "TGFBR1,AKAP9,CAMK2A,PHK"
)
这是dplyr
和tidyr
的解决方案
df %>% select(Pathway, molecules) %>%
separate_rows(molecules,sep=",") %>%
group_by(Pathway) %>%
mutate(id=1:n()) %>%
spread(key="Pathway", value="molecules") %>%
select(-id)
#> # A tibble: 4 x 5
#> GHR PKA PKB PKC TGFB
#> <chr> <chr> <chr> <chr> <chr>
#> 1 CD40LG PIK3CA MAST2 TGFBR1 ADAMTS8
#> 2 TGFBR1 PDGFA PIK3CA AKAP9 PIK3R1
#> 3 MYH9 PIK3R1 TGFBR1 CAMK2A HRAS
#> 4 MMP1 SPH BAD PHK SEM
在这里,我们首先关注select
列,然后用逗号分隔行。下一个任务是将数据从长格式重新广播到宽格式。为此,您将需要一个唯一的ID来匹配行。在您spread
列之后,我可以删除id
答案 2 :(得分:1)
dat=read.table(strings=F,text="Pathway log_value ratio z_score molecules
GHR N/A N/A N/A CD40LG,TGFBR1,MYH9,MMP1…
TGFB N/A N/A N/A ADAMTS8,PIK3R1,HRAS,SEM…
PKA N/A N/A N/A PIK3CA,PDGFA,PIK3R1,SPH…
PKB N/A N/A N/A MAST2,PIK3CA,TGFBR1,BAD…
PKC N/A N/A N/A TGFBR1,AKAP9,CAMK2A,PHK…",na.string="N/A",h=T)
a = data.frame(t(read.table(text=dat$molecules,sep=",")),stringsAsFactors = F)
setNames(a,dat$Pathway)
GHR TGFB PKA PKB PKC
V1 CD40LG ADAMTS8 PIK3CA MAST2 TGFBR1
V2 TGFBR1 PIK3R1 PDGFA PIK3CA AKAP9
V3 MYH9 HRAS PIK3R1 TGFBR1 CAMK2A
V4 MMP1… SEM… SPH… BAD… PHK…