我想在重塑文本文件时寻求社区的帮助。文本文件如下所示:
TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7
GO:0000226
GO:0006139
GO:0006259
TRINITY_GG_17866_c5_g1_i1
GO:0003674
GO:0005488
我最后想要得到的是这样的(用制表符分隔)
TRINITY_GG_17866_c1_g1_i7 GO:0000226
TRINITY_GG_17866_c1_g1_i7 GO:0006139
TRINITY_GG_17866_c1_g1_i7 GO:0006259
TRINITY_GG_17866_c5_g1_i1 GO:0003674
TRINITY_GG_17866_c5_g1_i1 GO:0005488
到目前为止,我仍未提出解决方案。对于这个问题,我将不胜感激。
最好的祝福,Ferenc
答案 0 :(得分:1)
一个dplyr
选项可能是:
df %>%
group_by(grp = cumsum(!startsWith(V1, "GO:"))) %>%
filter(n() > 1) %>%
mutate(V2 = lead(V1),
V1 = first(V1)) %>%
na.omit() %>%
ungroup() %>%
select(-grp)
V1 V2
<chr> <chr>
1 TRINITY_GG_17866_c1_g1_i7 GO:0000226
2 TRINITY_GG_17866_c1_g1_i7 GO:0006139
3 TRINITY_GG_17866_c1_g1_i7 GO:0006259
4 TRINITY_GG_17866_c5_g1_i1 GO:0003674
5 TRINITY_GG_17866_c5_g1_i1 GO:0005488
或作为一列:
df %>%
group_by(grp = cumsum(!startsWith(V1, "GO:"))) %>%
filter(n() > 1) %>%
mutate(V2 = lead(V1),
V1 = first(V1)) %>%
na.omit() %>%
ungroup() %>%
select(-grp) %>%
transmute(V1 = paste(V1, V2))
V1
<chr>
1 TRINITY_GG_17866_c1_g1_i7 GO:0000226
2 TRINITY_GG_17866_c1_g1_i7 GO:0006139
3 TRINITY_GG_17866_c1_g1_i7 GO:0006259
4 TRINITY_GG_17866_c5_g1_i1 GO:0003674
5 TRINITY_GG_17866_c5_g1_i1 GO:0005488
样本数据:
df <- read.table(text = "TRINITY_GG_17866_c6_g1_i1
TRINITY_GG_17866_c3_g1_i1
TRINITY_GG_17866_c1_g1_i7
GO:0000226
GO:0006139
GO:0006259
TRINITY_GG_17866_c5_g1_i1
GO:0003674
GO:0005488",
header = FALSE,
stringsAsFactors = FALSE)