将数据框折叠到唯一的行

时间:2018-01-23 19:10:04

标签: r

我要做的是折叠我的数据框,使每个唯一日期都有与之关联的相应变量。以下是我现在的数据示例(fyi:我使用的完整数据集大约是300个不同日期的数据集):

date <- c("10/30/17", "10/30/17", "10/30/17", "10/30/17")
eventcode <- c("14", "14", "14", "14")
eoi145 <- c(1, 0, 0, 0)
eoi140 <- c(0, 1, 0, 0)
eoi141 <- c(0, 0, 0, 1)
eoi143 <- c(0, 0, 1, 0)
df <- data.frame(date, eventcode, eoi145, eoi140, eoi141, eoi143)
View(df)

我想进入这种格式:

date <- c("10/30/17")
eventcode <- c("14")
eoi145 <- c(1)
eoi140 <- c(1)
eoi141 <- c(1)
eoi143 <- c(1)
df <- data.frame(date, eventcode, eoi145, eoi140, eoi141, eoi143)

我尝试过使用石膏,熔化和重塑。任何人都可以给我一个暗示,任何包或技术,以实现这一目标。

谢谢!

2 个答案:

答案 0 :(得分:1)

One approach from the dplyr package:

library(dplyr)
reduced_df <- df %>%
  group_by(date, eventcode) %>%
  summarise_all(funs(as.integer(sum(.)))) %>%
  ungroup()

With output:

# A tibble: 1 x 6
#  date     eventcode eoi145 eoi140 eoi141 eoi143
#  <fct>    <fct>      <int>  <int>  <int>  <int>
#  10/30/17 14             1      1      1      1

答案 1 :(得分:1)

An other way with data table

dcast(melt(DT,measure.vars = patterns("^eoi"))[value != 0],date + eventcode ~variable)

       date eventcode eoi145 eoi140 eoi141 eoi143
1: 10/30/17        14      1      1      1      1

you melt and take value != 0

melt(DT,measure.vars = patterns("^eoi"))[value == 1]
       date eventcode variable value
1: 10/30/17        14   eoi145     1
2: 10/30/17        14   eoi140     1
3: 10/30/17        14   eoi141     1
4: 10/30/17        14   eoi143     1

and then go back to broad again with dcast. The advantage is that it works for almost any situation. Like if you have

eoi143 <- c(0, 0, 1, 1)

       date eventcode eoi145 eoi140 eoi141 eoi143
1: 10/30/17        14      1      1      1      2