我想将我拥有的大数据集从长格式更改为宽格式。目前,我的数据集形成如下:
df <- structure(list(Politician = c("1", "2", "3", "k", "1", "2", "3",
"k"), country = c("uk", "nl", "ro", "z", "uk", "nl", "ro", "z"
), variables = c(NA, NA, NA, NA, NA, NA, NA, NA), voteid = c(12,
12, 12, 12, 13, 13, 13, 13), votedecision = c(1, 9, 9, 1, 3,
2, 0, 9)), row.names = c(NA, -8L), class = c("tbl_df", "tbl",
"data.frame"))
现在,我想按以下方式重塑此表决矩阵:
# A tibble: 3 x 8
Politician counrty variables vote12 vote13 vote14 vote15 ...
<int> <chr> <lgl> <dbl> <dbl> <dbl> <dbl> <chr>
1 1 uk NA 1 3 1 9 ...
2 2 nl NA 9 2 2 0 ...
3 3 ro NA 9 0 1 2 ...
数据集包含8个变量和超过900万个观测值。我对Rstudio来说还很陌生,所以到目前为止,我只是尝试了一些我在互联网上找到的代码。例如:
ep.new = cast(ep, mepid~voteid, value = "votedecision")
当我运行该命令时花了很长时间,然后我得到一个警告: 聚合需要fun.aggregate:长度用作默认值
有人对解决我的问题有任何提示或建议吗?
*还有更多变量,包含有关特定政客的信息。
答案 0 :(得分:0)
您可以使用tidyr程序包,特别是spread
来调整整齐的数据:
library(tidyr)
spread(df, key = voteid, value = votedecision, sep = "")
# A tibble: 4 x 5
Politician country variables voteid12 voteid13
<chr> <chr> <lgl> <dbl> <dbl>
1 1 uk NA 1 3
2 2 nl NA 9 2
3 3 ro NA 9 0
4 k z NA 1 9