根据R中的逻辑条件运行脚本

时间:2018-06-24 12:27:01

标签: r if-statement odbc rodbc

在我的数据集中,我使用的是分层的SKU-acnumber-year。 这里有个小例子:

df=structure(list(SKU = c(11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 
11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L, 11202L
), stuff = c(8.85947691, 9.450108704, 10.0407405, 10.0407405, 
10.63137229, 11.22200409, 11.22200409, 11.81263588, 12.40326767, 
12.40326767, 12.40326767, 12.99389947, 13.58453126, 14.17516306, 
14.76579485, 15.94705844, 17.12832203, 17.71895382, 21.26274458, 
25.98779894, 63.19760196), action = c(0L, 0L, 0L, 0L, 0L, 0L, 
0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L), 
    acnumber = c(137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 137L, 
    137L, 137L, 137L), year = c(2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 
    2018L, 2018L, 2018L, 2018L, 2018L, 2018L, 2018L)), .Names = c("SKU", 
"stuff", "action", "acnumber", "year"), class = "data.frame", row.names = c(NA, 
-21L))

非常重要:

“动作”列只有两个值0和1。正如在本示例中看到的那样,按动作类别1的东西有3个观察值,按类别零的东西有18个观察值。

我需要设置逻辑条件。 因此,对于按1类动作的类别具有1至4个观察值的组,请运行script1.r

,对于按1类动作的类别具有> = 5个观察值的组,则必须运行script2.r

我可以这样想,创建了script3.r, 具有以下内容(条件),但我不知道如何正确设置这些逻辑条件。

# i take data from sql
dbHandle <- odbcDriverConnect("driver={SQL Server};server=;database=;trusted_connection=true")
sql <- paste0(select needed columns)
df <- sqlQuery(dbHandle, sql)



   for groups where from 1-4  observations by stuff of 1 category of action then  C:/path to/скрипт1.r
(or if  groups have from 1-4  observations by stuff of 1 category of action then  C:/path to/script1.r)
    for  groups   where >=5 observations by stuff of 1 category of action then C:/path to/script2.r
( of if groups  have >=5 observations by stuff of 1 category of action then C:/path to/script2.r)

我该如何实现? script.3r按计划运行,它将根据计划运行,以运行两个脚本。 我只是不想随意地为每个脚本制作我的Shedule。

1 个答案:

答案 0 :(得分:2)

请考虑if内的by逻辑,该逻辑是按因子对数据帧进行切片的方法。并通过命令行使用system()调用Rscript来运行其他脚本(假设R bin目录设置为PATH环境变量):

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  if (sum(sub$action == 1) %in% c(1:4))   system("Rscript /path/to/script1.r")
  if (sum(sub$action == 1) >= 5)          system("Rscript /path/to/script2.r")

  return(sub)
})

更好的做法是在主脚本中使用source()外部脚本,确保将两个脚本的整个过程包装在function()调用中,甚至添加诸如特定SKU之类的参数。否则,source 运行那些文件。使用这种方法,您可以返回输出。

source("/path/to/script1.r")   # IMPORTS script1_function()
source("/path/to/script2.r")   # IMPORTS script2_function()

by_list <- by(df, df[,c("SKU", "acnumber", "year")], function(sub) {

  current_SKU <- max(sub$SKU)   # OR min(sub$SKU) OR sub$SKU[[1]]

  if (sum(sub$action == 1) %in% c(1:4))  output <- script1_function()
  if (sum(sub$action == 1) >= 5)         output <- script2_function()

  return(output)
})