Question

我有一个像这样的数据集：

CREATE PROCEDURE dbo.UpdateRecord(
     @NewValue1 INT,
     @IdValue INT,
     @ErrorMessage NVARCHAR(4000) OUTPUT
) 
AS
BEGIN
    BEGIN TRY
       MERGE TableA as Tgt
       USING (
             VALUES(@IdValue, @NewValue1)
       ) AS src(IdValue, MyName)
        ON Tgt.Id = src.IdValue
      WHEN NOT MATCHED THEN
         INSERT (Id, MyName)
         VALUES(Src.IdValue, Src.MyName)
      WHEN MATCHED THEN
          UPDATE 
          SET MyName = Src.MyName;

    END TRY
    BEGIN CATCH
       SET @ErrorMessage = ERROR_MESSAGE()

      declare @ErrorSeverity int, @ErrorState int;
      select @ErrorMessage = ERROR_MESSAGE() + ' Line ' + cast(ERROR_LINE() as nvarchar(5)), @ErrorSeverity = ERROR_SEVERITY(), @ErrorState = ERROR_STATE();

      raiserror (@ErrorMessage, @ErrorSeverity, @ErrorState);

    END CATCH
END

另一个像这样：

> head(featured_products)
   Dept Class     Sku                    Description Code Vehicle/Placement  StartDate    EndDate  Comments(Circulation,Location,etc)
1:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM          Facebook 2017-01-01 2017-01-29                   Fancy Brunch Blog
2:  430  4318  401684          ++INDV RAMEKIN WP 9CM  OSM           Twitter 2017-01-01 2017-01-29                   Fancy Brunch Blog
3:  340  3411 1672605            ++ SPHERE WILLOW 4"  OP1         Editorial 2016-02-29 2016-03-27                Spruce up for Spring
4:  230  2311 2114074 ++BOX 30 ISLAND ORCHRD TLIGHTS   EM             Email 2016-02-17 2016-02-17 Island Orchard and Jeweled Lanterns
5:  895  8957 2118072            ++PAPASAN STL TAUPE  OSM         Instagram 2017-08-26 2017-10-01                    by @audriestorme
6:  895  8957 2118072            ++PAPASAN STL TAUPE   EM             Email 2017-11-23 2017-11-23               Day 2 Black Friday AM

我在名为SKU ActivityDate OnlineSalesQuantity OnlineDiscountPercent InStoreSalesQuantity InStoreDiscountPercent 1: 401684 2015-12-01 150 0.00 406 2.72 2: 401684 2015-12-02 0 0.00 556 3.79 3: 401684 2015-12-03 0 0.00 723 3.44 4: 401684 2015-12-04 16 4.91 781 2.46 5: 401684 2015-12-05 17 0.00 982 3.18 6: 401684 2015-12-06 0 0.00 851 3.12的第二个df中添加了一列，如果产品在给定日期的第一个df中列出，则为1，否则为0。

现在，我想要做的是将featured列添加到新的，合并的df（当Vehicle/Placement == 1时）...这里的问题是那里的不同日期可能是不同的车辆，或多个......

如何扫描featured行的日期并将其与df1进行比较，然后提取featured == 1并将其添加到合并的df中？

这也必须有效地完成，因为df2是285万行...

我正在寻找以下内容：

Vehicle/Placement

但这会产生错误：

警告消息：如果（合并$ featured == 1）{：条件长度> 1，只使用第一个元素

我认为我找到了另一种解决方案，但它非常慢并且需要数小时才能运行：

# Add vehicle
if(combined$featured == 1) {
  for (n in 1:nrow(featured_products)) {

    for (m in 1:nrow(combined)) {

      combined$vehicle <- ifelse(combined$activitydate[m] %within% interval(featured_products$startdate[n],featured_products$enddate[n]), featured_products$`vehicle/placement`, NA)

    }
  } 
}

Answer 1

您希望将其制作成更小，更高效的子任务。我们知道，当我们使用矢量化代码时R可以非常快，但not when using for loops。我们通过使用以下命令来利用它：

combined = merge(combined, featured_products) # merge/join both data frames
mismatch = !(combined$ActivityDate %within% interval(combined$StartDate, combined$EndDate) & combined$featured == 1) # Query rows
combined$Placement[mismatch] = NA # Remove Placement in mismatched rows
combined[,c("StartDate", "EndDate")] = NULL # Remove columns

请注意，列/对象名称可能与您的名称不同，因此您可能需要调整它们。

根据日期范围有效地向df添加新列

1 个答案: