Question

我目前面临以下问题。

我想拿出一个R代码，在我的主数据帧reviews_last30days中创建一个名为listings的新列，该列应该能够计算或累积每个唯一{{ 1}}。

每个ID的唯一评论都在另一个数据框中列出，如下所示：

listings$ID

我还需要添加一个日期条件，例如因此，仅考虑从REVIEWS ID review_date 1 2015-12-30 1 2015-12-31 1 2016-10-27 2 2014-05-10 2 2016-10-19 2 2016-10-22 2 2016-10-23开始的最近30天。

因此，我的结果应类似于第三列：（更新：有关预期结果的更好描述，请参见EDIT）

last_scrape

因此，最后，自指定的LISTINGS ID last_scrape reviews_last30days 1 2016-11-15 1 2 2016-11-15 3开始的30天之内，reviews_last30days列应为每个review_date计算ID。

我已经用“％Y-％m-％d”格式化了两个日期列“ as.Date”。

很抱歉，如果我的问题可能对你们来说不够清晰，很难解释或形象化，但就代码而言，希望它毕竟不会那么复杂。

编辑以澄清

除了上面指出的输入REVIEWS之外，我还有第二个输入数据框，即OVERVIEW，目前看起来像这样简化形式：

last_scape

因此，我真正需要的是一个代码，用于计算OVERVIEW中的OVERVIEW ID last_scrape 1 2016-11-15 2 2016-11-15 3 2016-11-15 4 2017-01-15 5 2017-01-15 6 2017-01-15 7 2017-01-15 etc与REVIEWS中的review_date和REVIEWS中的ID匹配的ID的所有条目从“概述”中的review_date起最多30天。

然后，代码应该理想地将此新计算的值分配为OVERVIEW中的新列，如下所示：

last_scrape

＃2编辑-希望是我的最后一个;）

感谢您到目前为止的帮助@mfidino！绘制最新代码仍然会导致一个小错误，即以下错误：

OVERVIEW
   ID   last_scrape   rev_last30days
   1    2016-11-15    1
   2    2016-11-15    3
   3    2016-11-15    ..
   4    2017-01-15    ..
   5    2017-01-15    ..
   6    2017-01-15    ..
   7    2017-01-15    ..
etc

您是否知道如何解决此错误？

注意：我使用的名称与原始文件中的名称相同，代码仍应相同。

如果有帮助，向量TOTALREV$review_date <- ymd(TOTALREV$review_date) TOTALLISTINGS$last_scraped.calc <- ymd(TOTALLISTINGS$last_scraped.calc) gen_listings <- function(review = NULL, overview = NULL){ # tibble to return to_return <- review %>% inner_join(., overview, by = 'listing_id') %>% group_by(listing_id) %>% summarise(last_scraped.calc = unique(last_scraped.calc), reviews_last30days = sum(review_date >= (last_scraped.calc-30))) return(to_return) } REVIEWCOUNT <- gen_listings(TOTALREV, TOTALLISTINGS) Error: Column `last_scraped.calc` must be length 1 (a summary value), not 2的一些属性：

last_scraped.calc

$ last_scraped.calc   : Date, format: "2018-08-07" "2018-08-07" ...

typeof(TOTALLISTINGS$last_scraped.calc)
[1] "double"

和

length(TOTALLISTINGS$last_scraped.calc)
[1] 549281

任何进一步的帮助，不胜感激-预先感谢！

Answer 1

您可以使用dplyr轻松完成此操作。我在这里使用的是lubridate::ymd()，而不是as.Date()。

library(lubridate)
library(dplyr)

REVIEWS <- data.frame(ID = c(1,1,1,2,2,2,2),
             review_date = c("2015-12-30",
                             "2015-12-31",
                             "2016-10-27",
                             "2014-05-10",
                             "2016-10-19",
                             "2016-10-22",
                             "2016-10-23"))

REVIEWS$review_date <- ymd(REVIEWS$review_date)

LISTINGS <- REVIEWS %>% group_by(ID) %>% 
              summarise(last_scrape = max(review_date),
              reviews_last30days = sum(review_date >= (max(review_date)-30)))

LISTINGS的输出是您的预期输出：

# A tibble: 2 x 3
     ID last_scrape reviews_last30days
  <dbl> <date>                   <int>
1     1 2016-10-27                   1
2     2 2016-10-23                   3

编辑：

相反，如果您有兴趣让last_scrape作为输入而不是每个组的最新审阅日期，则可以这样修改代码。假设每个ID可以有多个last_scrape：

library(lubridate)
library(dplyr)

REVIEWS <- data.frame(ID = c(1,1,1,2,2,2,2),
             review_date = c("2015-12-30",
                             "2015-12-31",
                             "2016-10-27",
                             "2014-05-10",
                             "2016-10-19",
                             "2016-10-22",
                             "2016-10-23"))

REVIEWS$review_date <- ymd(REVIEWS$review_date)

OVERVIEW <- data.frame(ID = rep(1:7, 2),
                       last_scrape = c("2016-11-15",
                                       "2016-11-15",
                                       "2016-11-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2017-01-15",
                                       "2016-11-20",
                                       "2016-11-20",
                                       "2016-11-20",
                                       "2017-01-20",
                                       "2017-01-20",
                                       "2017-01-20",
                                       "2017-01-20"))

OVERVIEW$last_scrape <- ymd(OVERVIEW$last_scrape)

gen_listings <- function(review = NULL, overview = NULL){
  # tibble to return
  to_return <- review %>% 
    inner_join(., overview, by ='ID') %>% 
    group_by(ID, last_scrape) %>% 
    summarise(
    reviews_last30days = sum(review_date >= (last_scrape-30)))
  return(to_return)
}

LISTINGS <- gen_listings(REVIEWS, OVERVIEW)

此LISTINGS对象的输出为：

     ID last_scrape reviews_last30days
  <dbl> <date>                   <int>
1     1 2016-11-15                   1
2     1 2016-11-20                   1
3     2 2016-11-15                   3
4     2 2016-11-20                   2

Answer 2

类似于以上答案...

template <typename Container, typename UnaryOperation, typename U>
inline auto to_vec_from_vectors(Container& c, UnaryOperation&& op, U& ex)
    -> std::vector<decltype(*std::begin(op(*std::begin(c))))> {
  std::vector<decltype(*std::begin(op(*std::begin(c))))> v;
  for (auto& e : c) {
    std::vector<decltype(*std::begin(op(*std::begin(c))))> opv = op(e);
    concat(v, opv);
  }
  return v;  
}

是否有R函数以日期范围为条件来镜像EXCEL COUNTIFS？

编辑以澄清

＃2编辑-希望是我的最后一个;）

2 个答案: