根据包含日期的向量将行插入数据框

时间:2016-02-24 22:47:01

标签: r data.table dplyr zoo

这就是我的数据框架:

df <- read.table(text='

    Name      ActivityType     ActivityDate              
     John       Email            2014-01-01                              
     John       Webinar          2014-01-05                            
     John       Webinar          2014-01-20                                                       
     John       Email            2014-04-20                            
     Tom        Email            2014-01-01                              
     Tom       Webinar           2014-01-05                           
     Tom       Webinar           2014-01-20                                                        
     Tom       Email             2014-04-20                              

    ', header=T, row.names = NULL)

我有这个包含不同日期的向量x x<- c("2014-01-03","2014-01-25","2015-05-27")。我想以一种在x向量中包含这些日期的方式在原始数据框中插入行。这就是输出应该是这样的:

    Name      ActivityType     ActivityDate              
     John       Email            2014-01-01
     John        NA              2014-01-03        
     John       Webinar          2014-01-05                            
     John       Webinar          2014-01-20
     John       NA               2014-01-25                                                       
     John       Email            2014-04-20
     John       NA               2015-05-27                            
     Tom        Email            2014-01-01
     Tom        NA               2014-01-03                              
     Tom       Webinar           2014-01-05                           
     Tom       Webinar           2014-01-20
     Tom       NA                2014-01-25                                                        
     Tom       Email             2014-04-20
     Tom       NA                2015-05-27  

真诚地感谢您的帮助!

2 个答案:

答案 0 :(得分:4)

看起来你已经为每个人添加了一个“新”日期,对吗?

在这种情况下,您可以将x变为data.frame,然后合并/加入

## original dataframe
df <- data.frame(Name = c(rep("John", 4), rep("Tom", 4)),
                 ActivityType = c("Email","Web","Web","Email","Email","Web","Web", "Email"),
                 ActivityDate = c("2014-01-01","2014-05-01","2014-20-01","2014-20-04","2014-01-01","2014-05-01","2014-20-01","2014-20-04"))

## Turning x into a dataframe.
x <- data.frame(ActivityDate = rep(c("2014-01-03","2014-01-25","2015-05-27"), 2),
                Name = rep(c("John","Tom"), 3))

merge(df, x, by=c("Name", "ActivityDate"), all=T)

#    Name ActivityDate ActivityType
# 1  John   2014-01-01        Email
# 2  John   2014-05-01          Web
# 3  John   2014-20-01          Web
# 4  John   2014-20-04        Email
# 5  John   2014-01-03         <NA>
# 6  John   2014-01-25         <NA>
# 7  John   2015-05-27         <NA>
# 8   Tom   2014-01-01        Email
# 9   Tom   2014-05-01          Web
# 10  Tom   2014-20-01          Web
# 11  Tom   2014-20-04        Email
# 12  Tom   2014-01-03         <NA>
# 13  Tom   2014-01-25         <NA>
# 14  Tom   2015-05-27         <NA>

<强>更新

由于您遇到了内存问题,因此可以使用data.table

library(data.table)
dt <- as.data.table(df)
x_dt <- as.data.table(x)

merge(dt, x_dt, by=c("Name","ActivityDate"), all=T)

或者,如果您不期待merge,可以rbind使用data.table的{​​{1}}

rbindlist

更新2

要生成具有16000个uniqe名称的rbindlist(list(dt, x_dt), fill=TRUE) ## fill sets the 'ActivityType' to NA in X (我在这里使用了数字,但原理是相同的)和30个日期

x

答案 1 :(得分:3)

1)expand.grid 使用expand.grid创建一个数据框adds,其中包含要添加的行,然后使用rbind合并df }和addsActivityDate列转换为"Date"类。然后排序。没有包使用。

adds <- expand.grid(Name = levels(df$Name), ActivityType = NA, ActivityDate = x)
both <- transform(rbind(df, adds), ActivityDate = as.Date(ActivityDate))

o <- with(both, order(Name, ActivityDate))
both[o, ]

,并提供:

   Name ActivityType ActivityDate
1  John        Email   2014-01-01
9  John         <NA>   2014-01-03
2  John      Webinar   2014-01-05
3  John      Webinar   2014-01-20
11 John         <NA>   2014-01-25
4  John        Email   2014-04-20
13 John         <NA>   2015-05-27
5   Tom        Email   2014-01-01
10  Tom         <NA>   2014-01-03
6   Tom      Webinar   2014-01-05
7   Tom      Webinar   2014-01-20
12  Tom         <NA>   2014-01-25
8   Tom        Email   2014-04-20
14  Tom         <NA>   2015-05-27

2)sqldf 这会将add和df上传到它动态创建的sqlite数据库,然后执行sql查询并下载结果。计算发生在R之外,因此它可能适用于您的大数据。

adds <- data.frame(Name = NA, ActivityDate = x)

library(sqldf)

sqldf("select * 
       from (select * 
             from df 
             union 
             select a.Name, NULL ActivityType, ActivityDate 
             from (select distinct Name from df) a 
             cross join adds b
            ) order by 1, 3"
      )

,并提供:

   Name ActivityType ActivityDate
1  John        Email   2014-01-01
2  John         <NA>   2014-01-03
3  John      Webinar   2014-01-05
4  John      Webinar   2014-01-20
5  John         <NA>   2014-01-25
6  John        Email   2014-04-20
7  John         <NA>   2015-05-27
8   Tom        Email   2014-01-01
9   Tom         <NA>   2014-01-03
10  Tom      Webinar   2014-01-05
11  Tom      Webinar   2014-01-20
12  Tom         <NA>   2014-01-25
13  Tom        Email   2014-04-20
14  Tom         <NA>   2015-05-27