我有10个excel(.xlsx)文件,每个文件有10个电子表格。
我需要从每个工作簿中读取3个电子表格,最后将其附加到R中的单个数据框。
数据:
标题:
Country Jan-14 Feb-14 Mar-14 Apr-14 May-14 Jun-14 Jul-14 Aug-14 Sep-14 Oct-14 Nov-14 Dec-14 FY
实际数据
Austria 43 52 64 82 60 61 57 36 110 96 66 64 791
Belgium 143 258 184 207 202 191 209 118 136 169 121 108 2,046
Bulgaria 0 0 0 0 0 0 0 0 0 0 0 0 0
代码:
library(XLConnect)
files = list.files("C:/Users/kushaa/Documents/Frost_casestudy/")
sheet.index <- c(3,6,9)
colname = c("Country","Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec","FY","truck_type")
data.1 <- data.frame(matrix(rep(NA,length(colname)),ncol = length(colname)))
for (i in 1:length(files)){
wb = loadWorkbook(files[i])
for (j in 1:length(sheet.index)){
ss = readWorksheet(wb, sheet.index[j],startRow = 5, header = FALSE)
truck_type = rep(sheet.names[j],nrow(ss))
df = data.frame(ss,truck_type)
names(df) <- colname
data_merge <- rbind(data.1,df)
}
}
但是只能从一张纸(truck_type = CV)获取数据而不是纸张(truck_type = LCV,HCV)
输出:
Country Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec FY truck_type
1 <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA> <NA>
2 Austria 43 54 67 90 68 97 65 108 83 75 87 90 927 CV
3 Belgium 275 232 306 235 330 339 279 239 261 211 155 122 2,984 CV
如何从文件名中提取年份:
[1] "2014_by_country_and_type_Enlarged_Europe.xlsx"
[2] "20140211_02_2012_vo_By_Country_Enlarged_Europe.xls"
[3] "20150219_2013_vo_By_Country_Enlarged_Europe.xlsx"
查询:
regmatches(files, regexpr("[0-9].*[0-9]", files))
但它给出了:
[1] "2014"
[2] "20140211_02_2012"
[3]"20150219_2013"
我需要输出为:
2014
2012
2013