合并R

时间:2016-11-01 19:09:58

标签: r dataframe merge

我想要合并两个数据框;但是,我只想保留一个约会。 df1将是2013年1月1日至2016年10月1日的月份.df2将包含事件发生的频率。如果该月没有事件,则df2将不显示值。

df1< - data.frame(date = seq(as.Date(" 2013-01-01"),as.Date(" 2016-10-01") ,by =" month"))

    df1
    date            Freq 
    1  2013-01-01    0
    2  2013-02-01    0
    3  2013-03-01    0
    4  2013-04-01    0
    5  2013-05-01    0
    ...

    df2
    date            Freq
    1  2013-03-01    1
    2  2013-08-01    2
    3  2014-04-01    5
    4  2014-05-01    2
    5  2014-06-01    5
    ...

我希望新数据框看起来如下所示。

    date            Freq 
    1  2013-01-01    0
    2  2013-02-01    0
    3  2013-03-01    1
    4  2013-04-01    0
    5  2013-05-01    0
    6  2013-06-01    0
    7  2013-07-01    0
    8  2013-08-01    2
    9  2013-09-01    0
    ...

3 个答案:

答案 0 :(得分:0)

您可merge使用all.x=TRUE,然后将合并产生的NA设置为零:

out <- merge(df1,df2,all.x=TRUE)
out[is.na(out)] <- 0
head(out,10)
##         date Freq
##1  2013-01-01    0
##2  2013-02-01    0
##3  2013-03-01    1
##4  2013-04-01    0
##5  2013-05-01    0
##6  2013-06-01    0
##7  2013-07-01    0
##8  2013-08-01    2
##9  2013-09-01    0
##10 2013-10-01    0

数据:在OP中创建df1

df1 <- data.frame(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))

df1 <- structure(list(date = structure(c(15706, 15737, 15765, 15796, 
15826, 15857, 15887, 15918, 15949, 15979, 16010, 16040, 16071, 
16102, 16130, 16161, 16191, 16222, 16252, 16283, 16314, 16344, 
16375, 16405, 16436, 16467, 16495, 16526, 16556, 16587, 16617, 
16648, 16679, 16709, 16740, 16770, 16801, 16832, 16861, 16892, 
16922, 16953, 16983, 17014, 17045, 17075), class = "Date")), .Names = "date", row.names = c(NA, 
-46L), class = "data.frame")
##         date
##1  2013-01-01
##2  2013-02-01
##3  2013-03-01
##4  2013-04-01
##5  2013-05-01
## ...
##42 2016-06-01
##43 2016-07-01
##44 2016-08-01
##45 2016-09-01
##46 2016-10-01

df2 <- structure(list(date = structure(c(15765, 15918, 16161, 16191, 
16222), class = "Date"), Freq = c(1L, 2L, 5L, 2L, 5L)), .Names = c("date", 
"Freq"), row.names = c(NA, -5L), class = "data.frame")
##        date Freq
##1 2013-03-01    1
##2 2013-08-01    2
##3 2014-04-01    5
##4 2014-05-01    2
##5 2014-06-01    5

答案 1 :(得分:0)

使用dplyr进行连接,

library(dplyr)

full_join(df1, df2) %>% 
    group_by(date) %>% 
    summarise(Freq = sum(Freq))

## # A tibble: 9 × 2
##         date  Freq
##       <date> <int>
## 1 2013-01-01     0
## 2 2013-02-01     0
## 3 2013-03-01     1
## 4 2013-04-01     0
## 5 2013-05-01     0
## 6 2013-08-01     2
## 7 2014-04-01     5
## 8 2014-05-01     2
## 9 2014-06-01     5

或基础等价物,

aggregate(Freq ~ date, merge(df1, df2, all = TRUE), sum)

##         date Freq
## 1 2013-01-01    0
## 2 2013-02-01    0
## 3 2013-03-01    1
## 4 2013-04-01    0
## 5 2013-05-01    0
## 6 2013-08-01    2
## 7 2014-04-01    5
## 8 2014-05-01    2
## 9 2014-06-01    5

如果你愿意,请在事后订购。

答案 2 :(得分:0)

有data.table方式

library(data.table)
#Create the data
set.seed(1234)
df1 <- data.table(date=seq(as.Date("2013-01-01"), as.Date("2016-10-01"), by="month"))
df2 <- data.table(date=sample(df1$date, size= 10), freq=sample(1:10, 10, replace=T))

#Set keys
setkey(df1, date)
setkey(df2, date)

#data.table magic
df1[df2, freq := freq ]
df1[!df2, freq := 0 ]
df1

结果:

            date freq
 1: 2013-01-01    3
 2: 2013-02-01    0
 3: 2013-03-01    0
 4: 2013-04-01    0
 5: 2013-05-01    0
 6: 2013-06-01    7
 7: 2013-07-01    0
 ...