从长到宽重塑大数据集

时间:2019-04-14 16:50:31

标签: r reshape

我想将我拥有的大数据集从长格式更改为宽格式。目前,我的数据集形成如下:

df <- structure(list(Politician = c("1", "2", "3", "k", "1", "2", "3", 
"k"), country = c("uk", "nl", "ro", "z", "uk", "nl", "ro", "z"
), variables = c(NA, NA, NA, NA, NA, NA, NA, NA), voteid = c(12, 
12, 12, 12, 13, 13, 13, 13), votedecision = c(1, 9, 9, 1, 3, 
2, 0, 9)), row.names = c(NA, -8L), class = c("tbl_df", "tbl", 
"data.frame"))

现在,我想按以下方式重塑此表决矩阵:

# A tibble: 3 x 8
  Politician counrty variables vote12 vote13 vote14 vote15 ...  
       <int> <chr>   <lgl>      <dbl>  <dbl>  <dbl>  <dbl> <chr>
1          1 uk      NA             1      3      1      9 ...  
2          2 nl      NA             9      2      2      0 ...  
3          3 ro      NA             9      0      1      2 ...  

数据集包含8个变量和超过900万个观测值。我对Rstudio来说还很陌生,所以到目前为止,我只是尝试了一些我在互联网上找到的代码。例如:

ep.new = cast(ep, mepid~voteid, value = "votedecision")

当我运行该命令时花了很长时间,然后我得到一个警告: 聚合需要fun.aggregate:长度用作默认值

有人对解决我的问题有任何提示或建议吗?

*还有更多变量,包含有关特定政客的信息。

1 个答案:

答案 0 :(得分:0)

您可以使用tidyr程序包,特别是spread来调整整齐的数据:

library(tidyr)

spread(df, key = voteid, value = votedecision, sep = "")

# A tibble: 4 x 5
  Politician country variables voteid12 voteid13
  <chr>      <chr>   <lgl>        <dbl>    <dbl>
1 1          uk      NA               1        3
2 2          nl      NA               9        2
3 3          ro      NA               9        0
4 k          z       NA               1        9