我有一个宽格式的数据框
setseed(1)
df = data.frame(item=letters[1:6], field1a=sample(6,6),field1b=sample(60,6),
field1c=sample(200,6),field2a=sample(6,6),field2b=sample(60,6),
field2c=sample(200,6))
将所有列堆叠在一起并将所有b组合在一起并将所有c组合在一起的最佳方法是什么
items fielda fieldb fieldc
a 2 52 121
a 1 44 57
答案 0 :(得分:4)
使用基数R:
cbind(item=df$item,unstack(transform(stack(df,-1),ind=sub("\\d+","",ind))))
item fielda fieldb fieldc
1 a 2 57 138
2 b 6 39 77
3 c 3 37 153
4 d 4 4 99
5 e 1 12 141
6 f 5 10 194
7 a 3 17 97
8 b 4 23 120
9 c 5 1 98
10 d 1 22 37
11 e 2 49 163
12 f 6 19 131
或者您可以使用Base R中的reshape
函数:
reshape(df,varying = split(names(df)[-1],rep(1:3,2)),idvar = "item",direction = "long")
item time field1a field1b field1c
a.1 a 1 2 57 138
b.1 b 1 6 39 77
c.1 c 1 3 37 153
d.1 d 1 4 4 99
e.1 e 1 1 12 141
f.1 f 1 5 10 194
a.2 a 2 3 17 97
b.2 b 2 4 23 120
c.2 c 2 5 1 98
d.2 d 2 1 22 37
e.2 e 2 2 49 163
f.2 f 2 6 19 131
您还可以决定自己分隔数据框的名称,然后对其进行格式化:
names(df)=sub("(\\d)(.)","\\2.\\1",names(df))
reshape(df,varying= -1,idvar = "item",direction = "long")
答案 1 :(得分:3)
如果我们使用的是tidyverse
,那么gather
就会使用spread
长期'格式,使用列名称和library(tidyverse)
out <- df %>%
gather(key, val, -item) %>%
mutate(key1 = gsub("\\d+", "", key),
key2 = gsub("\\D+", "", key)) %>%
select(-key) %>%
spread(key1, val) %>%
select(-key2)
head(out, 2)
# item fielda fieldb fieldc
#1 a 2 57 138
#2 a 3 17 97
melt/dcast
或类似选项data.table
melt
,dcast
我们library(data.table)
dcast(melt(setDT(df), id.var = "item")[, variable := sub("\\d+", "", variable)
], item + rowid(variable) ~ variable, value.var = 'value')[
, variable := NULL][]
# item fielda fieldb fieldc
# 1: a 2 57 138
# 2: a 3 17 97
# 3: b 6 39 77
# 4: b 4 23 120
# 5: c 3 37 153
# 6: c 5 1 98
# 7: d 4 4 99
# 8: d 1 22 37
# 9: e 1 12 141
#10: e 2 49 163
#11: f 5 10 194
#12: f 6 19 131
进入&#39; long&#39;格式,子串的变量&#39;然后set.seed(1)
df = data.frame(item = letters[1:6],
field1a=sample(6,6),
field1b=sample(60,6),
field1c=sample(200,6),
field2a=sample(6,6),
field2b=sample(60,6),
field2c=sample(200,6))
到&#39;宽&#39;格式
{{1}}
注意:对于每种情况,当长度不均衡时也应该起作用
{{1}}