使用dplyr将列转换为单行

时间:2018-06-15 04:00:44

标签: r dplyr tidyr

我正在尝试将我的数据集转换为将一行转换为单行。

原始数据集:

enter image description here

我得到的是

enter image description here

我想要的是:

enter image description here

library(dplyr)
library(tidyr)

df <- test %>%
  group_by(cusip, year, typecode, ticker, stkname, indcode) %>%
  summarise(mean_shares=mean(shares), mean_prc=mean(prc))

df_2 <- df%>%
  spread(typecode, mean_shares, fill = 0)

1 个答案:

答案 0 :(得分:1)

您遇到的问题是,mean_prc的每个值都有一个单独的行,因为每个typecode都有所不同。目前还不清楚应该在您希望的输出中填充单元格值,因为每行已经有mean_shares列。

说明要点:

#create reproducible dataframe
test <- data.frame(expand.grid(cusip = c(36020, 78549, 22102, 87354), 
                               year = 2000:2003, typecode = 1:5, 
                               ticker = c("ABC", "BDF", "ASFK", "JERG")),
                   shares = rnorm(320, 100, 60),
                   prc = rnorm(320, 60, 5))

df <- test %>%
  group_by(cusip, year, typecode, ticker) %>%
  summarise(mean_shares=mean(shares), mean_prc=mean(prc)) %>%
  spread(typecode, mean_shares, fill = 0)
head(df)
# A tibble: 6 x 9
# Groups:   cusip, year [1]
  cusip  year ticker mean_prc   `1`   `2`   `3`   `4`   `5`
  <dbl> <int> <fct>     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22102  2000 ABC        59.3   0      0   136.    0    0  
2 22102  2000 ABC        60.1   0      0     0     0   27.4
3 22102  2000 ABC        60.6  53.8    0     0     0    0  
4 22102  2000 ABC        61.7   0      0     0   268.   0  
5 22102  2000 ABC        65.5   0    168.    0     0    0  
6 22102  2000 BDF        54.7   0      0   141.    0    0  

现在,如果我们删除mean_prc,它将填充每行的所有值:

df_2 <-test %>%
  group_by(cusip, year, typecode, ticker) %>%
  summarise(mean_shares=mean(shares), mean_prc=mean(prc)) %>%
  #drop mean_prc
  select(-mean_prc) %>%
  spread(typecode, mean_shares, fill = 0)

head(df_2)

# A tibble: 6 x 8
# Groups:   cusip, year [2]
  cusip  year ticker   `1`    `2`    `3`    `4`      `5`
  <dbl> <int> <fct>  <dbl>  <dbl>  <dbl>  <dbl>    <dbl>
1 22102  2000 ABC     53.8  168.  136.    268.    27.4  
2 22102  2000 BDF     57.6   73.9 141.     70.4   52.3  
3 22102  2000 ASFK   212.   113.    4.77  -13.7   -0.240
4 22102  2000 JERG    36.7   42.9  63.7   165.   215.   
5 22102  2001 ABC     19.6   13.4  10.5   -23.8   97.5  
6 22102  2001 BDF    110.   -11.6 127.     62.4  110.  

简而言之,您需要确保您没有尝试传播数据,同时拥有另一个变量,该变量对于您尝试传播的因素的每个级别都是唯一的。