我正在尝试将我的数据集转换为将一行转换为单行。
原始数据集:
我得到的是
我想要的是:
library(dplyr)
library(tidyr)
df <- test %>%
group_by(cusip, year, typecode, ticker, stkname, indcode) %>%
summarise(mean_shares=mean(shares), mean_prc=mean(prc))
df_2 <- df%>%
spread(typecode, mean_shares, fill = 0)
答案 0 :(得分:1)
您遇到的问题是,mean_prc
的每个值都有一个单独的行,因为每个typecode
都有所不同。目前还不清楚应该在您希望的输出中填充单元格值,因为每行已经有mean_shares
列。
说明要点:
#create reproducible dataframe
test <- data.frame(expand.grid(cusip = c(36020, 78549, 22102, 87354),
year = 2000:2003, typecode = 1:5,
ticker = c("ABC", "BDF", "ASFK", "JERG")),
shares = rnorm(320, 100, 60),
prc = rnorm(320, 60, 5))
df <- test %>%
group_by(cusip, year, typecode, ticker) %>%
summarise(mean_shares=mean(shares), mean_prc=mean(prc)) %>%
spread(typecode, mean_shares, fill = 0)
head(df)
# A tibble: 6 x 9
# Groups: cusip, year [1]
cusip year ticker mean_prc `1` `2` `3` `4` `5`
<dbl> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22102 2000 ABC 59.3 0 0 136. 0 0
2 22102 2000 ABC 60.1 0 0 0 0 27.4
3 22102 2000 ABC 60.6 53.8 0 0 0 0
4 22102 2000 ABC 61.7 0 0 0 268. 0
5 22102 2000 ABC 65.5 0 168. 0 0 0
6 22102 2000 BDF 54.7 0 0 141. 0 0
现在,如果我们删除mean_prc
,它将填充每行的所有值:
df_2 <-test %>%
group_by(cusip, year, typecode, ticker) %>%
summarise(mean_shares=mean(shares), mean_prc=mean(prc)) %>%
#drop mean_prc
select(-mean_prc) %>%
spread(typecode, mean_shares, fill = 0)
head(df_2)
# A tibble: 6 x 8
# Groups: cusip, year [2]
cusip year ticker `1` `2` `3` `4` `5`
<dbl> <int> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
1 22102 2000 ABC 53.8 168. 136. 268. 27.4
2 22102 2000 BDF 57.6 73.9 141. 70.4 52.3
3 22102 2000 ASFK 212. 113. 4.77 -13.7 -0.240
4 22102 2000 JERG 36.7 42.9 63.7 165. 215.
5 22102 2001 ABC 19.6 13.4 10.5 -23.8 97.5
6 22102 2001 BDF 110. -11.6 127. 62.4 110.
简而言之,您需要确保您没有尝试传播数据,同时拥有另一个变量,该变量对于您尝试传播的因素的每个级别都是唯一的。