在R中组织和使用计数数据

时间:2017-05-11 17:23:52

标签: r dataframe

我一直在查看stackoverflow和youtube试图找到一种方法来执行以下操作。

我有这种格式的数据:

structure(list(year = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L), ID = c(222L, 
222L, 333L, 333L, 222L, 222L, 333L, 333L), sport = c(" baseball", 
" football", " baseball", " football", " baseball", " football", 
" baseball", " football"), money_raised = c(5L, 6L, 4L, 5L, 5L, 
6L, 4L, 5L), money_used = c(3L, 4L, 2L, 3L, 3L, 4L, 2L, 3L), 
    money_total = c(7L, 6L, 7L, 8L, 7L, 6L, 7L, 8L)), .Names = c("year", 
"ID", "sport", "money_raised", "money_used", "money_total"), class = "data.frame", row.names = c(NA, 
-8L))

这只是数据的一个例子,实际上,而不是每个ID的2项运动,我有5个。

我希望将数据组织成列,这样我只有一行用于ID和年份,其中每个运动都有列和他们筹集,使用和总计的钱,这样我的数据将如下所示:

structure(list(year = c(1L, 1L), ID = c(222L, 333L), money_raised_baseball = c(5L, 
4L), money_used_baseball = c(3L, 2L), money_total_baseball = c(7L, 
7L), money_raised_football = c(6L, 5L), money_used_football = c(4L, 
3L), money_total_football = c(6L, 8L)), .Names = c("year", "ID", 
"money_raised_baseball", "money_used_baseball", "money_total_baseball", 
"money_raised_football", "money_used_football", "money_total_football"
), class = "data.frame", row.names = c(NA, -2L))

1 个答案:

答案 0 :(得分:0)

# Load package
library(tidyverse)

# Create the example data frame
dt <- read.csv(text = "year,ID,sport,money_raised,money_used,money_total
1,222,baseball,5,3,7
1,222,football,6,4,6
1,333,baseball,4,2,7
1,333,football,5,3,8
2,222,baseball,5,3,7
2,222,football,6,4,6
2,333,baseball,4,2,7
               2,333,football,5,3,8",
               stringsAsFactors = FALSE)

# Process the data
dt2 <- dt %>%
  gather(money, value, contains("money")) %>%
  unite(money_sport, money, sport, sep = "_") %>%
  spread(money_sport, value) %>%
  select(year, ID, money_raised_baseball, money_used_baseball, money_total_baseball,
         money_raised_football, money_used_football, money_total_football)