FOR循环到包含is.na

时间:2016-07-08 11:43:56

标签: r if-statement for-loop na

我已将一组if语句写入FOR循环,但是循环需要超过10分钟才能运行,并且在阅读了一篇描述如何使IFELSE代替FOR的文章之后,我一直试图加快这一步骤。循环。

数据集的头部是这样的:

Destination.City.Name Booking.ID Creation.Date Cancellation.Date Arrival.Date Status.Name Nights Room.nights DI.flag Star.rating
1             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-15   Cancelled     90          90       N           4
2             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-14   Cancelled     90          90       N           4
3             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-06   Cancelled     90          90       N           4
4             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-04-02   Cancelled     90          90       N           4
5             Abu Dhabi   14418661    2015-02-16        2015-02-16   2015-03-29   Cancelled     90          90       N           4
6             Abu Dhabi    9634541    2013-06-11        2013-06-13   2013-09-13   Cancelled     90          90       N           5
  Future.Arrival.Flag Future.Creation.Flag Future.Arrival.Day Status.On.Model.Date
1                   1                    1                469                   NA
2                   1                    1                468                   NA
3                   1                    1                460                   NA
4                   1                    1                456                   NA
5                   1                    1                452                   NA
6                  NA                   NA                 NA                   NA

FOR循环基本上根据简单逻辑填充最后一列Status.On.Model.Date

如果创建日期在模型日期之后,则为NA。

如果取消日期为NA,则确认。

如果取消日期>> =型号日期,则确认,否则取消。

原始的FOR循环如下所示,执行时,它可以工作,但需要超过10分钟(数据集为600K +行):

i = 1
for (i in 1:length(bookingdata$Status.On.Model.Date)) {
  if (bookingdata$Creation.Date[i] > Model.Date){   
      bookingdata$Status.On.Model.Date[i] = NA     
    } else {
        if (is.na(bookingdata$Cancellation.Date[i])) {  #
            bookingdata$Status.On.Model.Date[i] = 'Confirmed'
        } else {
            if (bookingdata$Cancellation.Date[i] >= Model.Date){
                bookingdata$Status.On.Model.Date[i] = 'Confirmed'
            } else {
                if (bookingdata$Cancellation.Date[i] < Model.Date){
                    bookingdata$Status.On.Model.Date[i] = 'Cancelled'
            }
        }
    }
  }
}

我写的新IFELSE代码代替了以下内容:

bookingdata$Status.On.Model.Date = ifelse(bookingdata$Creation.Date > Model.Date, NA,
                                    ifelse(is.na(bookingdata$Cancellation.Date, 'Confirmed',
                                      ifelse(bookingdata$Cancellation.Date >= Model.Date, 'Confirmed', 'Cancelled'))))

但我也收到错误:

Error in is.na(bookingdata$Cancellation.Date, "Confirmed", ifelse(bookingdata$Cancellation.Date >=  : 
  3 arguments passed to 'is.na' which requires 1

我不确定如何纠正错误,因为我不知道如何重新调整这些陈述。

谢谢!

1 个答案:

答案 0 :(得分:0)

请使用下面的代码,你错过了括号is.na()

bookingdata$Status.On.Model.Date = ifelse(bookingdata$Creation.Date > Model.Date, NA,
                                        ifelse(is.na(bookingdata$Cancellation.Date), 'Confirmed',
                                          ifelse(bookingdata$Cancellation.Date >= Model.Date, 'Confirmed', 'Cancelled')))