我有这种形式的数据
id year facname class_code line_no value
1 1 A County 1 county1
1 1 A County 2 county2
1 1 A source1 1 9
1 1 A source1 2 4
1 1 A source2 1 7
1 1 A source2 2 2
1 1 A source3 1 8...
2 1 B County 1 county1
2 1 B County 2 county1
2 1 B source1 1 21
2 1 B source1 2 9
2 1 B source2 1 4
2 1 B source2 2 7 ....
我正在尝试将此转换为以下内容: (请注意,最后3列将具有值'相应地传播)
id year facname line_no County source1 source2 source3
1 1 A 1 county1 9 7 8
1 2 A 2 county2 4 2 NA
1 3 A 3 county3
1 4 A 4 county4
2 1 B 1 county1
2 2 B 2 county2
2 3 B 3 county3
2 4 B 4 county4
这将显示县的不同支付者数量(source1,source2,source3)和县名(county1,county2)。 我知道它的传播(可能聚集)的一些组合,但我无法绕过它。
感谢任何帮助,谢谢! (PS:我知道这可能是一个重复的问题,但我对整理数据真的很陌生)
编辑:县(county1,2..etc。)实际上是数字(在原始数据集中),但本质上是分类的,因此我将它们称为county1,其他值(来源)实际上是该县参加活动的人数(source1,source2等)。每个设施总共有40个line_no。
答案 0 :(得分:2)
选项是使用双tidyr::spread
作为:
更新
library(dplyr)
library(tidyr)
# Just spread can transform and work on present sample data used by OP
df %>% spread(class_code, value)
#The complicated version below based was initially used to handle different
#line numbers for rows with "County" and rows without "County"
filter(df, class_code == "County") %>% spread(class_code, value) %>%
left_join(filter(df, class_code != "County") %>% spread(class_code, value),
by=c("id", "line_no", "facname"))
# id facname line_no County source1 source2 source3
# 1 1 A 1 county1 9 7 8
# 2 1 A 2 county2 4 2 <NA>
# 3 2 B 1 county1 21 4 <NA>
# 4 2 B 2 county1 9 7 <NA>
数据:强>
df <- read.table(text =
"id facname class_code line_no value
1 A County 1 county1
1 A County 2 county2
1 A source1 1 9
1 A source1 2 4
1 A source2 1 7
1 A source2 2 2
1 A source3 1 8
2 B County 1 county1
2 B County 2 county1
2 B source1 1 21
2 B source1 2 9
2 B source2 1 4
2 B source2 2 7",
header = TRUE, stringsAsFactors = FALSE)
答案 1 :(得分:2)
我们可以使用dcast
data.table
library(data.table)
dcast(setDT(df1), id + facname + rowid(class_code) ~ class_code, value.var = 'value')
# id facname class_code County source1 source2 source3
#1: 1 A 1 county1 9 7 8
#2: 1 A 2 county2 4 2 NA
#3: 2 B 3 county1 21 4 NA
#4: 2 B 4 county1 9 7 NA
如果我们需要预期输出中的8行
dcast(setDT(df1), id + facname + rowid(class_code) ~ class_code,
value.var = 'value', drop = FALSE)[ ,.SD[!all(is.na(County))], .(id, facname)]
# id facname class_code County source1 source2 source3
#1: 1 A 1 county1 9 7 8
#2: 1 A 2 county2 4 2 NA
#3: 1 A 3 NA NA NA NA
#4: 1 A 4 NA NA NA NA
#5: 2 B 1 NA NA NA NA
#6: 2 B 2 NA NA NA NA
#7: 2 B 3 county1 21 4 NA
#8: 2 B 4 county1 9 7 NA