我正在阅读大量包含每个产品月度价格信息的文件。
我想获得一个合并所有这些文件的数据表。
此表的键将是带有产品标识符和日期的2列。
然后第三列包含零售价。
在源文件中,每个价格列都有一个格式为RETAILPRICE_ [dd.mm.yyyy]的名称。
为防止最终数据表中包含大量列,我需要使用零售价重命名该列并创建一个包含日期的新列。
以下代码遇到错误,因为containers:
- name: nginx
imagePullPolicy: Never
image: custom-nginx
ports:
- containerPort: 80
无法理解对其列之一的外部引用。
data.table
这会导致错误消息
# this is how I obtain the list of files that have to be read in
# list the files
# files <- list.files(path = "path",
# pattern = "^Publications.*$",
# full.names = T)
# the data looks like this, although it is contained in an excel file.
# sample data
ProdID <- list(836187, 2398159, 2398165, 2398171, 2398188, 1800180, 2320105, 2320128, 2320140, 2320163, 1714888, 2516340)
RETAILPRICE_01.01.2003 <- c(12.50, 43.50, 65.50, 45.60, 69.45, 21.30, 81.15, 210.70, 405.00, 793.60, 116.50, 162.60)
Publications_per_2003.01.01 <- data.table(ProdID,RETAILPRICE_01.01.2003)
# uncomment if you want to write this to excel
# using .xls on purpose, because that's what they used back in the days
# xlsx::write.xlsx(Publications_per_2003.01.01,
# "Publications_per_2003.01.01.xls",
# row.names = F)
# files <- list.files(path = "path",
# pattern = "^Publications.*$",
# full.names = T)
# create data table
price_list <- data.table(
prodID = character(),
date = character(),
retail_price = numeric())
price_list <- lapply(files, function(x){
# obtain date from file name
# date in file name has the structure yyyy_mm_dd
# while in the column name date has the structure
# dd.mm.yyyy
date <- substr(sapply(strsplit(x,"_"),"[",3),1,10)
# obtain day, month and year separately
day <- substr(date,9,10)
month <- substr(date,6,7)
year <- substr(date,1,4)
# store the name of the column containing the retail price
priceVar <- as.name(paste0("RETAILPRICE_",day,".",month,".",year))
# read the xls file with the price info and in one go
# keep only the relevant columns
file <- data.table(read_excel(x))[
,.(prodID= as.character(ProdID),
retail_price = priceVar,
date = as.character(gsub("\\.","-",date)))#,with = F
]
# merge the new file with the existing data table
price_list <- merge(price_list,file,"ProdID")
})
如果我对此部分发表评论
Error in rep(x[[i]], length.out = mn) :
attempt to replicate an object of type 'symbol'
没有错误。
因此问题出在对无法正常工作的列的引用上。
我也尝试过
retail_price = priceVar,
但是我得到了错误(列名已修改为适合示例):
priceVar <- as.name(paste0("RETAILPRICE_",day,".",month,".",year))
file <- data.table(read_excel(x))
setnames(file, priceVar, "retail_price")
如果有人能启发我,我将永远感激不已。
答案 0 :(得分:0)
如果您提供要使用的数据的样本,可能会很好,因此我们可以尝试使用数据样本的代码。 我也阅读了您的代码,并在这一行上:
price_list <- merge(prijslijst,file,"ProdID")
您从未提到变量“ prijslijst”,所以问题可能出在这里。
答案 1 :(得分:0)
在这种情况下,使用纯数据帧而不是使用data.table会容易得多。
price_list <- lapply(files, function(x){
date <- substr(sapply(strsplit(x,"_"),"[",3),1,10)
day <- substr(date,9,10)
month <- substr(date,6,7)
year <- substr(date,1,4)
# make it a character, not a name
priceVar <- paste0("RETAILPRICE_",day,".",month,".",year)
one_df <- readxl::read_excel(x)[, c("ProdID", priceVar)]
colnames(one_df) <- c("prodID", "retail_price")
one_df$prodID = as.character(one_df$prodID) # NB: as.integer would be much more efficient, but be careful for values above 2.0e9
one_df$date = as.character(gsub("\\.","-",date))
one_df
})
# Watch out: this will pile up the records from all files
# In your initial code you were using merge(...) which computes the intersection
price_list <- do.call(rbind, price_list)
# Optional:
data.table::setDT(price_list)