使用dbplyr为每个组创建潜在顾客值

时间:2019-02-08 13:08:19

标签: r dbplyr

我的数据包含母牛的交配数据,我需要为每个组的日期变量创建提前值,因为我想汇总两个产犊日期之间的交配数量。

我的代码与本地数据完美配合。但是,数据库正在增长,将所有数据本地提取并运行代码毫无意义。我正在尝试使用dbplyr将代码推送到服务器。但是,这对我来说是一个错误的错误。

这是我的代码的片段:

agrosql <- getSQLdata(id=27)

calf <- tbl(agrosql,"T_Animal_Calvings")

# Sort on chronolgically per animal
calf <- arrange(calf,AnimalID,CalvingDate)

# Create lead calving date
calf %<>% 
  group_by(AnimalID) %>%
  mutate(leadcalf=lead(CalvingDate)) 

错误消息是:

Error: <SQL> 'SELECT  TOP 11 "MatingID", "CalvingID", "AnimalID", "AnimalServerID", "AnimalNo", "SortAnimalNo", "MatingDate", "MatingDateDT", "MatingType", "MatingTypeName", "MatingTime", "MatingEndDate", "MatingEndDateDT", "BullAnimalID", "BullName", "BullRegistrationId", "EmbryoMotherAnimalID", "EmbryoMotherName", "EmbryoMotherRegistrationId", "IsSexed", "IsOwnStock", "ChargeNo", "DosisQuantity", "EventCommentAbbr", "StaffMatingRelationID", "StaffMatingShortName", "StaffMatingRelationType", "StaffMatingRelationTypeName", "HasAssumedFlush", "ConceptionDate", "ConceptionDateDT", "ServiceNo", "NextMatingID", "LastPregnantDate", "LastPregnantDateDT", "FirstNotPregnantDate", "FirstNotPregnantDateDT", "IsMatingWithFlushing", "IsMatingWithEmbryoImplant", "IsMatingWithFertilityAbortion", LEAD("MatingDate", 1, NULL) OVER (PARTITION BY "AnimalID" ORDER BY "AnimalID", "MatingDate") AS "leadai"
FROM (SELECT *
FROM "T_Animal_Matings"
ORDER BY "AnimalID", "MatingDate") "mztkpapjhp"'
 nanodbc/nan

1 个答案:

答案 0 :(得分:0)

SQL查询的结构方式是,您有一个包含ORDER BY语句的子查询。 SQL在子查询中不接受ORDER BY

ORDER BY子句是由于您使用了dplyr的arrange函数。

如果要应用命令滞后/超前,请在lag / lead命令中执行此操作。请尝试以下操作:

calf %>% 
  group_by(AnimalID) %>%
  mutate(leadcalf = lead(CalvingDate,
                         order_by = c("calf", "AnimalID", "CalvingDate"))