尝试将函数逐行应用于数据框以创建新列

时间:2019-03-30 05:55:55

标签: r lubridate mutate rowwise

我有一个服务预订的数据框。每个预订都有合同的开始和结束日期。对于给定的报告日期,我想确定合同是否处于活动状态,如果是,则确定基于每月计费率计费的金额。如果合同在月中结束,我将按比例分配最后一个月的帐单。 这是数据框:

> bookings
     Account Service  MonthlyRate ContractStart ContractEnd
     1 A       W              50 2018-01-01    2018-12-31 
     2 A       X              75 2018-03-15    2019-03-14 
     3 B       W              60 2018-02-28    2018-09-30 
     4 B       X              90 2018-05-12    2019-08-11 
     5 B       Y              45 2018-02-28    2018-09-30 
     6 C       Y              50 2018-07-31    2019-04-30 
     7 D       W              65 2019-01-01    2019-03-31 
     8 D       Y              50 2018-09-01    2019-05-31 
     9 D       Z             110 2018-08-22    2019-12-31 
    10 E       Z             100 2018-10-01    2019-09-30 

我编写了一个使用lubridate的函数来计算月度帐单。

    monthly_revenue <- function(reporting_date, monthly_rate, start, end) {
      contract_int <- interval(start, end) # Contract interval
      # Calculate interval ending the last day of the month of contract end
      end_of_month <- end
      day(end_of_month) <- days_in_month(end)
      end_of_month_int <- interval(start, end_of_month)
      # Check if reporting date is within contract interval
      if(reporting_date %within% contract_int) {
        val <- 1 # bill for entire month
        # If not within interval, check if contract is in its last month
      } else if (reporting_date %within% end_of_month_int) {
        val <- day(end) / days_in_month(end) # prorate monthly charges
      } else { # Not within contract
        val <- 0 # zero revenue
      }
      val * monthly_rate
    }

然后我设置一个计费日期,并将该功能按行应用于数据框:

    billing_date <- as.Date("2019-03-29")
    revenue_for_month <-bookings %>%
      rowwise() %>%
      mutate(Revenue = monthly_revenue(billing_date, MonthlyRate, ContractStart, ContractEnd))

这将导致以下错误:

   Error in mutate_impl(.data, dots) : 
      Evaluation error: non-numeric argument to binary operator.

我无法确定问题出在我的函数上还是在迭代。任何帮助将由衷的感谢。

[根据收到的评论进行跟进] 我正在使用以下库调用:

library(tidyverse)
library(lubridate)

这是我的数据帧的dput输出:

> dput(bookings)
structure(list(Account = c("A", "A", "B", "B", "B", "C", "D", 
"D", "D", "E"), Type = c("W", "X", "W", "X", "Y", "Y", "W", "Y", 
"Z", "Z"), MonthlyRate = c(50L, 75L, 60L, 90L, 45L, 50L, 65L, 
50L, 110L, 100L), ContractStart = structure(c(NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_), class = "Date"), ContractEnd = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_), class = "Date")), .Names = c("Account", 
"Type", "MonthlyRate", "ContractStart", "ContractEnd"), row.names = c(NA, 
-10L), spec = structure(list(cols = structure(list(Account = structure(list(), class = c("collector_character", 
"collector")), Type = structure(list(), class = c("collector_character", 
"collector")), MonthlyRate = structure(list(), class = c("collector_integer", 
"collector")), ContractStart = structure(list(), class = c("collector_character", 
"collector")), ContractEnd = structure(list(), class = c("collector_character", 
"collector"))), .Names = c("Account", "Type", "MonthlyRate", 
"ContractStart", "ContractEnd")), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"), class = c("tbl_df", 
"tbl", "data.frame"))

1 个答案:

答案 0 :(得分:0)

由于您遇到了很多问题,我对您的功能进行了很大的更改。现在对我有用:

monthly_revenue <- function(reporting_date, monthly_rate, start, end) {
  contract_int <- interval(start, end) # Contract interval
  EoM_int <- interval(start, ceiling_date(as_date(end),unit="month")-1)

  reporting_date <- as_datetime(reporting_date)

  if(reporting_date %within% contract_int) {
    val <- 1 # bill for entire month
    # If not within interval, check if contract is in its last month
  } else if (reporting_date %within% EoM_int) {
    val <- day(end) / day(ceiling_date(as_date(end),unit="month")-1) # prorate monthly charges
  } else { # Not within contract
    val <- 0 # zero revenue
  }
  return(val * monthly_rate)
}

您的dplyr代码正确无误,并且运行正常。