循环包含用R中的字符列表/向量指定的宏变量

时间:2016-07-18 10:27:44

标签: r loops vector macros

在之前的查询中,我想知道我是否能找到类似于SAS宏变量的重复过程的解决方案。链接如下:

R macros to enable user defined input similar to %let in SAS

然而,我希望通过探索指定字符列表的可能性而不是用户输入宏变量的值(为方便起见,将其调用为宏变量)来向前迈进一步。

例如,这里是我正在处理的代码的简短摘录,它指定使用paste0函数使用宏变量:

### Change metric between MSP, RSP, Val, Price MSP, Price RSP, Margin
### Change level between product and channel
### Change histyr, baseyr, futeyr according to year value
metric <- "RSP"
level <- "channel"
histyr <- "2009"
baseyr <- "2014"
futeyr <- "2019"
macro <- "gni"
macro1 <- "pce"

inputpath <- "C:/Projects/Consumption curves/UKPOV/excel files/"
outpath <- "C:/Projects/Consumption curves/UKPOV/output/gen_output/"


infile <- paste0(inputpath, metric, "_", level, "_CC_", histyr, "_salespercap.csv")
Category_sales <- read.csv(infile)
macroeco_data <- read.csv(infile2)

macroeco_data$Country <- str_trim(macroeco_data$Country)

sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                           FROM Category_sales L LEFT JOIN macroeco_data R
                           ON (L.Country = R.Country) order by GNI_PPP DESC")

现在,不是在每次我想要创建字符列表或字符向量时指定每个度量标准,而是使用循环来为每个度量标准运行而无需人工干预。

我尝试了以下但它似乎没有奏效。不确定我这样做是否正确

metric <- c("MSP", "RSP", "Vol", "PriceMSP", "PriceRSP", "Margin")

for (i in metric) {
  level <- "channel"
  histyr <- "2009"
  baseyr <- "2014"
  futeyr <- "2019"
  macro <- "gni"
  macro1 <- "pce"
  inputpath <- "C:/Projects/Consumption curves/UKPOV/excel files/"
  outpath <- "C:/Projects/Consumption curves/UKPOV/output/gen_output/"

  infile <- paste0(inputpath, metric[i], "_", level, "_CC_", histyr, "_salespercap.csv")
  Category_sales <- read.csv(infile)


  infile2 <- paste0(inputpath,"macro_",histyr,".csv")
  macroeco_data<- read.csv(infile2)

  macroeco_data$Country<-str_trim(macroeco_data$Country)

  sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                           FROM Category_sales L LEFT JOIN macroeco_data R
                           ON (L.Country = R.Country) order by GNI_PPP DESC")
}

错误如下:

 metric<-c("MSP","RSP", "Vol","PriceMSP" ,"PriceRSP", "Margin")
> metric
[1] "MSP"      "RSP"      "Vol"      "PriceMSP" "PriceRSP" "Margin"  
> for(i in metric){
+ level<-"channel"
+ histyr<-"2009"
+ baseyr<-"2014"
+ futeyr<-"2019"
+ macro<-"gni"
+ macro1<-"pce"
+ inputpath<-"C:/Projects/Consumption curves/UKPOV/excel files/"
+ outpath<-"C:/Projects/Consumption curves/UKPOV/output/gen_output/"
+ infile <- paste0(inputpath,metric[i],"_",level,"_CC_",histyr,"_salespercap.csv")
+ Category_sales <- read.csv(infile)
+ infile2 <- paste0(inputpath,"macro_",histyr,".csv")
+ macroeco_data<- read.csv(infile2)
+ macroeco_data$Country<-str_trim(macroeco_data$Country)
+ sales_nd_macroeco <- sqldf("SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
+ FROM Category_sales L LEFT JOIN macroeco_data R
+ ON (L.Country = R.Country) order by GNI_PPP DESC")
+ }
Error in file(file, "rt") : cannot open the connection
In addition: Warning message:
In file(file, "rt") :
  cannot open file 'C:/Projects/Consumption curves/UKPOV/excel files/NA_channel_CC_2009_salespercap.csv': No such file or directory
> 

1 个答案:

答案 0 :(得分:2)

以下for循环:

for (x in y) {
  # do things
}

迭代y的元素,依次将每个元素分配给对象x并执行循环中包含的表达式。

在您的示例for (i in metric)中,metric是一个字符向量,对象i依次假定metric的每个元素的值。也就是说,第一次循环,i"MSP";第二次,i"RSP",依此类推。所以稍后,在您引用metric[i]时,第一次通过循环,这相当于metric["MSP"],当然是NA(在您的未命名向量的情况下)。这反过来导致文件名"C:/Projects/Consumption curves/UKPOV/excel files/NA_channel_CC_2009_salespercap.csv"

您引用metric[i]这一事实表明您希望i的值为metric元素的索引,即数字为了实现这种行为,您通常使用以下循环:

for (i in 1:length(metric)) {
  # do things
}

或等同于

for (i in seq_along(metric)) {
  # do things
}

以下内容可能会起到作用:

metric <- c('MSP', 'RSP', 'Vol', 'PriceMSP', 'PriceRSP', 'Margin')
inputpath <- 'C:/Projects/Consumption curves/UKPOV/excel files/'
level <- 'channel'
histyr <- '2009'

macroeco_data <- read.csv(paste0(inputpath, 'macro_', histyr, '.csv'))
macroeco_data$Country <- str_trim(macroeco_data$Country)

for (x in metric) {
  f <- paste0(inputpath, x, '_', level, '_CC_', histyr, '_salespercap.csv')
  Category_sales <- read.csv(f)
  sales_nd_macroeco <- sqldf('SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
                             FROM Category_sales L LEFT JOIN macroeco_data R
                             ON (L.Country = R.Country) order by GNI_PPP DESC')
})

请注意,我已将所有内容从循环中拉出来,而不需要在那里。

另外,请注意每次循环时都会覆盖sales_nd_macroeco的值,因此该对象的最终值将与度量"Margin"相对应。要改为返回list个对象,您可以使用for (i in seq_along(metric))迭代索引1:6,将sqldf的结果分配给sales_nd_macroeco[[i]],其中sales_nd_macroeco现在是您开始定义的长度为6的空列表,或者您可以使用lapply

sales_nd_macroeco <- lapply(metric, function(x) {
  f <- paste0(inputpath, x, '_', level, '_CC_', histyr, '_salespercap.csv')
  Category_sales <- read.csv(f)
  sqldf('SELECT  L.*, R.gnippp_histyr as GNI_PPP, R.SR_histyr as SR
         FROM Category_sales L LEFT JOIN macroeco_data R
         ON (L.Country = R.Country) order by GNI_PPP DESC')
})