根据日期范围有效地向df添加新列

时间:2018-01-22 15:02:26

标签: r function loops memory-efficient

我有一个像这样的数据集:

CREATE PROCEDURE dbo.UpdateRecord(
     @NewValue1 INT,
     @IdValue INT,
     @ErrorMessage NVARCHAR(4000) OUTPUT
) 
AS
BEGIN
    BEGIN TRY
       MERGE TableA as Tgt
       USING (
             VALUES(@IdValue, @NewValue1)
       ) AS src(IdValue, MyName)
        ON Tgt.Id = src.IdValue
      WHEN NOT MATCHED THEN
         INSERT (Id, MyName)
         VALUES(Src.IdValue, Src.MyName)
      WHEN MATCHED THEN
          UPDATE 
          SET MyName = Src.MyName;

    END TRY
    BEGIN CATCH
       SET @ErrorMessage = ERROR_MESSAGE()

      declare @ErrorSeverity int, @ErrorState int;
      select @ErrorMessage = ERROR_MESSAGE() + ' Line ' + cast(ERROR_LINE() as nvarchar(5)), @ErrorSeverity = ERROR_SEVERITY(), @ErrorState = ERROR_STATE();

      raiserror (@ErrorMessage, @ErrorSeverity, @ErrorState);

    END CATCH
END

另一个像这样:

> head(featured_products)
   Dept Class     Sku                    Description Code Vehicle/Placement  StartDate    EndDate  Comments(Circulation,Location,etc)
1:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM          Facebook 2017-01-01 2017-01-29                   Fancy Brunch Blog
2:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM           Twitter 2017-01-01 2017-01-29                   Fancy Brunch Blog
3:  340  3411 1672605            ++ SPHERE WILLOW 4"  OP1         Editorial 2016-02-29 2016-03-27                Spruce up for Spring
4:  230  2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS   EM             Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5:  895  8957 2118072            ++PAPASAN STL TAUPE  OSM         Instagram 2017-08-26 2017-10-01                    by @audriestorme
6:  895  8957 2118072            ++PAPASAN STL TAUPE   EM             Email 2017-11-23 2017-11-23               Day 2 Black Friday AM

我在名为SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent 1: 401684 2015-12-01 150 0.00 406 2.72 2: 401684 2015-12-02 0 0.00 556 3.79 3: 401684 2015-12-03 0 0.00 723 3.44 4: 401684 2015-12-04 16 4.91 781 2.46 5: 401684 2015-12-05 17 0.00 982 3.18 6: 401684 2015-12-06 0 0.00 851 3.12 的第二个df中添加了一列,如果产品在给定日期的第一个df中列出,则为1,否则为0。

现在,我想要做的是将featured列添加到新的,合并的df(当Vehicle/Placement == 1时)...这里的问题是那里的不同日期可能是不同的车辆,或多个......

如何扫描featured行的日期并将其与df1进行比较,然后提取featured == 1并将其添加到合并的df中?

这也必须有效地完成,因为df2是285万行...

我正在寻找以下内容:

Vehicle/Placement

但这会产生错误:

  

警告消息:如果(合并$ featured == 1){:条件   长度> 1,只使用第一个元素

我认为我找到了另一种解决方案,但它非常慢并且需要数小时才能运行:

# Add vehicle
if(combined$featured == 1) {
  for (n in 1:nrow(featured_products)) {

    for (m in 1:nrow(combined)) {

      combined$vehicle <- ifelse(combined$activitydate[m] %within% interval(featured_products$startdate[n],featured_products$enddate[n]), featured_products$`vehicle/placement`, NA)

    }
  } 
} 

1 个答案:

答案 0 :(得分:0)

您希望将其制作成更小,更高效的子任务。我们知道,当我们使用矢量化代码时R可以非常快,但not when using for loops。我们通过使用以下命令来利用它:

combined = merge(combined, featured_products) # merge/join both data frames
mismatch = !(combined$ActivityDate %within% interval(combined$StartDate, combined$EndDate) & combined$featured == 1) # Query rows
combined$Placement[mismatch] = NA # Remove Placement in mismatched rows
combined[,c("StartDate", "EndDate")] = NULL # Remove columns

请注意,列/对象名称可能与您的名称不同,因此您可能需要调整它们。