Question

我有大量的Excel数据，有4260行我想用R（和XLConnect包）处理。这4260条线路共有43个不同的区域，每个区域包含98条线路。我想要使用循环遍历所有行并将特定数据（列）存储到向量中，而不是为每个区重复43次类似代码。

这是我最初的原始代码的摘要：

wb = loadWorkbook("Bel_housing_prices_disctrict.xlsx") 

#--------------Malines (district 1)
df_Malines = readWorksheet(wb, sheet="Par arrondissement", startRow = 103, endRow = 201, startCol = 0, endCol = 0)
maisons_total_price_Malines <- df_Malines$Col8
maisons_price_mean_Malines <- df_Malines$Col10

villas_total_price_Malines <- df_Malines$Col18
villas_price_mean_Malines <- df_Malines$Col20

#--------------Turnhout (district 2)
df_Turnhout = readWorksheet(wb, sheet="Par arrondissement", startRow = 202, endRow = 300, startCol = 0, endCol = 0)
maisons_total_price_Turnhout <- df_Turnhout$Col8
maisons_price_mean_Turnhout <- df_Turnhout$Col10

villas_total_price_Turnhout <- df_Turnhout$Col18
villas_price_mean_Turnhout <- df_Turnhout$Col20

#-------------- (district 3)

每个地区（Malines，Turnhout等）的变化是“startRow” - 和“endRow” - 值，它会增加99，直到达到最后一行4260。我想写一个看起来像这样的循环：

i=103
for (n in c("Malines","Turnhout","district3",...))
{
df_"n" = readWorksheet(wb, sheet="Par arrondissement", startRow = i, endRow = i+98, startCol = 0, endCol = 0)
maisons_total_price_"n" <- df_"n"$Col8
...
i=i+99
}

但是，当然，这个循环/函数不起作用，因为我做错了，到目前为止我找不到解决方案......它只是一个函数的“想法”。函数（循环）将通过运行所有4260行，在“随时随地”创建新数据。如果我保存这些数据的想法做了不同的变量是错误的，我很高兴每个替代解决方案（列表？）？

我希望我能清楚地解决问题，我很高兴有关于整洁解决方案的任何提示！

祝你好运

Answer 1

从你的帖子中，我假设你有一个载有你感兴趣的所有地区名字的载体。

您可以使用apply类型函数从所有地区获取数据：

wb <- loadWorkbook("your_file.xlsx") 

#dummy vector of district names
districts<-c("district1","district2","district3")

#creates a vector of startRows
p<-seq(from=103,to=(length(districts)+1)*99,by=99)
p
#[1] 103 202 301

#for each value of p, get the rows, and rename the columns of the dataframes
data_list<-lapply(p,function(x){
        df<-readWorksheet(wb, sheet="Par arrondissement", startRow = x, endRow = x+98)[,c(8,10,18,20)];
        colnames(df)<-c("maisons_total_price","maisons_price_mean","villas_total_price","villas_price_mean")
        df$sum_price<-df$maisons_total_price+df$villas_price_mean
        df})

#this will return a list of dataframes, one for each district.
#to easily access them, add the names of the districts.
names(data_list)<-district

使用此列表，您可以使用data_list$district1来访问每个区的数据。

如果你想提供所有数据的dataframe，你可以这样做，假设没有行丢失：

data<-do.call(cbind,res)

然后，您可以使用data$district1.maisons_total_price来访问每个地区的数据，例如

Answer 2

首先，如果您能够提供reproducible示例，则其他人更容易帮助您。

实现此目标的一种方法是assign()和paste0()或paste()的组合。在干净的会话中运行这个简单的示例，以了解正在发生的事情：

for (i in 1:3) {
  assign(paste0("Variable_", i), i)
}

扩展到您的示例，您应该可以执行以下操作：

i=103
for (n in c("Malines","Turnhout","district3",...))
{
  assign(paste0("df_", n), readWorksheet(wb, sheet="Par arrondissement", startRow = i, endRow = i+98, startCol = 0, endCol = 0))
  assign(paste0("maisons_total_price_", n), paste0("maisons_total_price_", n, "$Col8"))
  ...
  i=i+99
}

通过R中的循环遍历大量数据

2 个答案: