在我的数据集中,我使用的是分层的SKU-acnumber-year。 这里有个小例子:
df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L,
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405,
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767,
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306,
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458,
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L),
acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L,
137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L,
2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU",
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA,
-21L))
“动作”列只有两个值0和1。正如在本示例中看到的那样,按动作类别1的东西有3个观察值,按类别零的东西有18个观察值。
我需要设置逻辑条件。 因此,对于按1类动作的类别具有1至4个观察值的组,请运行script1.r
,对于按1类动作的类别具有> = 5个观察值的组,则必须运行script2.r
我可以这样想,创建了script3.r, 具有以下内容(条件),但我不知道如何正确设置这些逻辑条件。
# i take data from sql
dbHandle <- odbcDriverConnect("driver={SQL Server};server=;database=;trusted_connection=true")
sql <- paste0(select needed columns)
df <- sqlQuery(dbHandle, sql)
for groups where from 1-4 observations by stuff of 1 category of action then C:/path to/скрипт1.r
(or if groups have from 1-4 observations by stuff of 1 category of action then C:/path to/script1.r)
for groups where >=5 observations by stuff of 1 category of action then C:/path to/script2.r
( of if groups have >=5 observations by stuff of 1 category of action then C:/path to/script2.r)
我该如何实现? script.3r按计划运行,它将根据计划运行,以运行两个脚本。 我只是不想随意地为每个脚本制作我的Shedule。
答案 0 :(得分:2)
请考虑if
内的by
逻辑,该逻辑是按因子对数据帧进行切片的方法。并通过命令行使用system()
调用Rscript
来运行其他脚本(假设R bin目录设置为PATH环境变量):
by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {
if (sum(sub$action == 1) %in% c(1:4)) system("Rscript /path/to/script1.r")
if (sum(sub$action == 1) >= 5) system("Rscript /path/to/script2.r")
return(sub)
})
更好的做法是在主脚本中使用source()
外部脚本,确保将两个脚本的整个过程包装在function()
调用中,甚至添加诸如特定SKU之类的参数。否则,source
将运行那些文件。使用这种方法,您可以返回输出。
source("/path/to/script1.r") # IMPORTS script1_function()
source("/path/to/script2.r") # IMPORTS script2_function()
by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {
current_SKU <- max(sub$SKU) # OR min(sub$SKU) OR sub$SKU[[1]]
if (sum(sub$action == 1) %in% c(1:4)) output <- script1_function()
if (sum(sub$action == 1) >= 5) output <- script2_function()
return(output)
})