for循环:R中不是同等大小的数据帧

时间:2015-08-26 12:20:04

标签: r loops csv for-loop

我正在开发一个类似于this

的数据框

以下是它的样子:

require: false

依此类推......直到635行返回。

与我要比较的其他数据集可以找到here

以下是它的样子:

shape id day     hour week id footfall category          area name
22496   22/3/14  3    12      634      Work cluster     CBD area 1  
22670   22/3/14  3    12      220      Shopping cluster Orchard Road 1  
23287   22/3/14  3    12      723      Airport  Changi  Airport 2   
16430   22/3/14  4    12      947      Work cluster     CBD area 2  
4697    22/3/14  3    12      220      Residential area Ang Mo Kio 2    
4911    22/3/14  3    12      1001     Shopping cluster Orchard Rd 3    
11126   22/3/14  3    12      220      Residential area Ang Mo Kio 2    

以及this我要与其category Foreigners Locals Work cluster 1600000 3623900 Shopping cluster 1800000 3646666.667 Airport 15095152 8902705 Residential area 527700 280000

进行比较的最后一个数据集

第一个和第二个共享相同的属性,即previousHour&第一个和第三个数据集共享相同的属性category

基于hour的{​​{1}}。例如,previousHour here

category应如下所示:

workcluster

直到144行返回...每个类别。

点击previousHour类别的here

hour 0 3 4 4 4 5 例如。对于shopping,应该如下所示:

previousHour

直到144行返回...

点击shopping类别的here

点击hour 0 3 3 4 4 5 类别的here

所有144行返回...

airport数据集:

residential

这是我理想在R中找到的东西:

SumHour

我不知道该怎么做,这就是我的尝试:

category                sumHour
1   Airport             2208
2   Residential area    1656
3   Shopping cluster    1656
4   Work cluster        1656

newtbl ......

没有任何反应

以下是 #for n in 1: number of rows{ # calculate sumHours(in SumHours dataset) - previousHour = newHourSum and store it as newHourSum # calculate hour/(newHourSum-previousHour) * Foreigners and store it as footfallHour # add to the empty dataframe } 理想

   mergetbl <- function(tbl1, tbl2)
{

  newtbl = data.frame(hour=numeric(),forgHour=numeric())

  ntbl1rows<-nrow(tbl1) # get the number of rows

  for(n in 1:ntbl1rows)
  {
    #for n in 1: number of rows{
    # check the previous hour from IDA dataset !!!!
    # calculate sumDate - previousHour = newHourSum and store it as newHourSum
    # calculate hour/(newHourSum-previousHour) * Foreigners and store it as footfallHour
    # add to the empty dataframe }
    newHourSum <- 3588 - tbl1
    footfallHour <- (tbl1$hour/(newHourSum-previousHour)) * tbl2$Foreigners
    newtbl <- rbind(newtbl, footfallHour)



  }
}

依旧......

2 个答案:

答案 0 :(得分:2)

根据向量进行思考得出:

试试这个:

### this is to get your Foreigners/Locals to be at the same size as tbl1


Foreigners=ifelse(tbl1$category=="Work cluster",tbl2$Foreigners[1], ifelse (tbl1$category=="Shopping cluster", tbl2$Foreigners[2], ifelse(tbl1$category=="Airport", tbl2$Foreigners[3], tbl2$Foreigners[4])))
Locals=ifelse(tbl1$category=="Work cluster",tbl2$Locals[1], ifelse (tbl1$category=="Shopping cluster", tbl2$Locals[2], ifelse(tbl1$category=="Airport", tbl2$Locals[3], tbl2$Locals[4])))

现在,功能

resultHour = function(tbl1, tbl2, ForeOrLoca)
{
previousHour = rep (0, nrow(tbl1))
for (i in 2:nrow(tbl1))
{
 previousHour[i] = tbl1$hour[i-1]
}

### The conditional sum matching the category from tbl1
NewHourSum = ifelse(tbl1$category=="Work cluster",sum(with(tbl1, hour*I(category == "Work cluster"))), ifelse (tbl1$category=="Shopping cluster", sum(with(tbl1, hour*I(category == "Shopping cluster"))), ifelse(tbl1$category=="Airport", sum(with(tbl1, hour*I(category == "Airport"))), sum(with(tbl1, hour*I(category == "Residential area"))))))

##and finally, this

hour = as.vector(tbl1$hour)

footfallHour <- (hour/(newHourSum - previousHour)) * ForeOrLoca
newtbl <- cbind(hour, footfallHour)
return (newtbl)
}

这是我得到的输出:

> head(newtbl)
 hour footfallHour
[1,]    3    1337.7926
[2,]    3    1506.2762
[3,]    3   12631.9264
[4,]    4    1785.2162
[5,]    3     441.7132
[6,]    3    1506.2762

使用该功能:

TheResultIWant = resultHour (tbl1,tbl2)

答案 1 :(得分:0)

对于你的新问题。

如果您将数据框切割成仅包含一个类别的数据框,则可以使用此功能:

new_function_hour_result = function (tbl1_categ, vec_categ, prevHour_Categ, sumHour_Categ)
hour = as.vector(tbl1_categ$hour)

footfallHour <- (hour/(sumHour_Categ- previousHour)) * vec_categ
newtbl <- cbind(hour, footfallHour)
return (newtbl)
}

使用tbl1_categ给定类别的数据框,vec_categ您的外国人或给定类别的本地数据,prevHour_Categ给定类别的previousHour,最后{{1} }给定类别的sumHour。

要让您的矢量与df相同,请将它们与以下内容进行比较:

例如,本地/机场类别中sumHour_Categ

vec_categ

外国人和机场类别:locals_airport = rep(category[3,3], nrow = nrow(tbl1_airport))

这将重复foreig_airport = rep(category[3,2], nrow = nrow(tbl1_airport))category[3,2]次中包含的值。

本地人和工作集的

nrow(tbl1_airport)

以及每个类别的每个向量(例如locals_workcluster = rep(category[1,3], nrow = nrow(tbl1_workcluster))prevHour_CategsumHour_Categ)等等