我正在使用tidyverse
,但也欢迎使用base
解决方案。
是否有一种方法可以不转置gather
数据框,但是key
可以存储在一行中,而不是key
作为列名。例如,假设我有一个叫df
的小标题。
df <- tibble(a = c(5,3,5,6,2,"G1"),
b = c(5,3,5,6,2,"G1"),
c = c(8,2,6,4,1,"G2"),
d = c(8,2,6,4,1,"G2"),
e = c(9,3,7,8,4,"G3"),
f = c(9,3,7,8,4,"G3"),
g = c(6,5,2,1,8,"G4"),
h = c(6,5,2,1,8,"G4"))
df
# A tibble: 6 x 8
a b c d e f g h
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 5 5 8 8 9 9 6 6
2 3 3 2 2 3 3 5 5
3 5 5 6 6 7 7 2 2
4 6 6 4 4 8 8 1 1
5 2 2 1 1 4 4 8 8
6 G1 G1 G2 G2 G3 G3 G4 G4
要分组或聚集的组在底行中。有没有一种方法可以使df
仅具有三列,从而将c,e和g列收集到a列中,将d,f和h列收集到b列中,而第6行成为column C?结果如下:
tibble(a = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
b = c(5,3,5,6,2,8,2,6,4,1,9,3,7,8,4,6,5,2,1,8),
c = c("G1","G1","G1","G1","G1","G2","G2","G2","G2","G2",
"G3","G3","G3","G3","G3","G4","G4","G4","G4","G4"))
# A tibble: 20 x 3
a b c
<dbl> <dbl> <chr>
1 5 5 G1
2 3 3 G1
3 5 5 G1
4 6 6 G1
5 2 2 G1
6 8 8 G2
7 2 2 G2
8 6 6 G2
9 4 4 G2
10 1 1 G2
11 9 9 G3
12 3 3 G3
13 7 7 G3
14 8 8 G3
15 4 4 G3
16 6 6 G4
17 5 5 G4
18 2 2 G4
19 1 1 G4
20 8 8 G4
我想避免转置,因为我需要保留行和列的顺序,直到正确标记所有内容为止。
答案 0 :(得分:3)
这是一个主意。
library(tidyverse)
df2 <- df %>%
t() %>%
as.data.frame(stringsAsFactors = FALSE) %>%
split(f = .$V6) %>%
map_dfr(~.x %>%
select(-V6) %>%
t() %>%
as.data.frame(stringsAsFactors = FALSE) %>%
setNames(c("a", "b")),
.id = "c") %>%
select(a, b, c) %>%
mutate_at(vars(-c), list(~as.numeric(.)))
df2
# a b c
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4
答案 1 :(得分:2)
这是一种实现。我们可以根据最后一行将split
的小标题变成list
的小标题,用list
,imap
的列rename
循环到相同的列名('a','b'),mutate
以创建具有list
名称的列'c'并绑定行
library(tidyverse)
df %>%
slice(-n()) %>%
split.default(df %>%
slice(n()) %>%
flatten_chr) %>%
imap_dfr(~ .x %>%
rename_all(~ c('a', 'b')) %>%
mutate(c = .y))
# A tibble: 20 x 3
# a b c
# <chr> <chr> <chr>
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
#10 1 1 G2
#11 9 9 G3
#12 3 3 G3
#13 7 7 G3
#14 8 8 G3
#15 4 4 G3
#16 6 6 G4
#17 5 5 G4
#18 2 2 G4
#19 1 1 G4
#20 8 8 G4
答案 2 :(得分:1)
如果逐步执行,移置可能不会受到损害。在此基础R解决方案中,行和列信息一直保留到最后一行。
d <- data.frame(t(as.matrix(df)))
l <- lapply(split(d[-6], d$X6), t)
res <- do.call(rbind, Map(cbind, l, c=names(l)))
res <- setNames(data.frame(res, row.names=NULL), letters[1:3])
res
# a b c
# 1 5 5 G1
# 2 3 3 G1
# 3 5 5 G1
# 4 6 6 G1
# 5 2 2 G1
# 6 8 8 G2
# 7 2 2 G2
# 8 6 6 G2
# 9 4 4 G2
# 10 1 1 G2
# 11 9 9 G3
# 12 3 3 G3
# 13 7 7 G3
# 14 8 8 G3
# 15 4 4 G3
# 16 6 6 G4
# 17 5 5 G4
# 18 2 2 G4
# 19 1 1 G4
# 20 8 8 G4
答案 3 :(得分:0)
带有data.table的一个选项
首先,由于我们没有使用原始名称,请替换它们。还要删除最后一行并将所有内容转换为整数。
library(data.table)
setDT(df)
df <- df[-.N]
df[, names(df) := lapply(.SD, as.integer)]
setnames(df, rep_len(c('a', 'b'), ncol(df)))
# a b a b a b a b
# 1: 5 5 8 8 9 9 6 6
# 2: 3 3 2 2 3 3 5 5
# 3: 5 5 6 6 7 7 2 2
# 4: 6 6 4 4 8 8 1 1
# 5: 2 2 1 1 4 4 8 8
现在在行号上melt
,添加G [1-4]列,并将融化的df dcast转换为宽格式。
df[, rid := 1:.N]
df2 <- melt(df, 'rid')
df2[, c := paste0('G', rowid(rid, variable))]
dcast(df2, rid + c ~ variable)[order(c), -'rid']
# c a b
# 1: G1 5 5
# 2: G1 3 3
# 3: G1 5 5
# 4: G1 6 6
# 5: G1 2 2
# 6: G2 8 8
# 7: G2 2 2
# 8: G2 6 6
# 9: G2 4 4
# 10: G2 1 1
# 11: G3 9 9
# 12: G3 3 3
# 13: G3 7 7
# 14: G3 8 8
# 15: G3 4 4
# 16: G4 6 6
# 17: G4 5 5
# 18: G4 2 2
# 19: G4 1 1
# 20: G4 8 8