Question

我正在开发一个类似于this

的数据框

以下是它的样子：

require: false

依此类推......直到635行返回。

与我要比较的其他数据集可以找到here

以下是它的样子：

shape id day     hour week id footfall category          area name
22496   22/3/14  3    12      634      Work cluster     CBD area 1  
22670   22/3/14  3    12      220      Shopping cluster Orchard Road 1  
23287   22/3/14  3    12      723      Airport  Changi  Airport 2   
16430   22/3/14  4    12      947      Work cluster     CBD area 2  
4697    22/3/14  3    12      220      Residential area Ang Mo Kio 2    
4911    22/3/14  3    12      1001     Shopping cluster Orchard Rd 3    
11126   22/3/14  3    12      220      Residential area Ang Mo Kio 2

以及this我要与其category Foreigners Locals Work cluster 1600000 3623900 Shopping cluster 1800000 3646666.667 Airport 15095152 8902705 Residential area 527700 280000

进行比较的最后一个数据集

第一个和第二个共享相同的属性，即previousHour＆amp;第一个和第三个数据集共享相同的属性category。

基于hour的{{1}}。例如，previousHour here

category应如下所示：

workcluster

直到144行返回...每个类别。

点击previousHour类别的here

hour 0 3 4 4 4 5例如。对于shopping，应该如下所示：

previousHour

直到144行返回...

点击shopping类别的here

点击hour 0 3 3 4 4 5类别的here

所有144行返回...

airport数据集：

residential

这是我理想在R中找到的东西：

SumHour

我不知道该怎么做，这就是我的尝试：

category                sumHour
1   Airport             2208
2   Residential area    1656
3   Shopping cluster    1656
4   Work cluster        1656

但 newtbl ......

没有任何反应

以下是#for n in 1: number of rows{ # calculate sumHours(in SumHours dataset) - previousHour = newHourSum and store it as newHourSum # calculate hour/(newHourSum-previousHour) * Foreigners and store it as footfallHour # add to the empty dataframe }的理想：

   mergetbl <- function(tbl1, tbl2)
{

  newtbl = data.frame(hour=numeric(),forgHour=numeric())

  ntbl1rows<-nrow(tbl1) # get the number of rows

  for(n in 1:ntbl1rows)
  {
    #for n in 1: number of rows{
    # check the previous hour from IDA dataset !!!!
    # calculate sumDate - previousHour = newHourSum and store it as newHourSum
    # calculate hour/(newHourSum-previousHour) * Foreigners and store it as footfallHour
    # add to the empty dataframe }
    newHourSum <- 3588 - tbl1
    footfallHour <- (tbl1$hour/(newHourSum-previousHour)) * tbl2$Foreigners
    newtbl <- rbind(newtbl, footfallHour)



  }
}

依旧......

Answer 1

根据向量进行思考得出：

试试这个：

### this is to get your Foreigners/Locals to be at the same size as tbl1


Foreigners=ifelse(tbl1$category=="Work cluster",tbl2$Foreigners[1], ifelse (tbl1$category=="Shopping cluster", tbl2$Foreigners[2], ifelse(tbl1$category=="Airport", tbl2$Foreigners[3], tbl2$Foreigners[4])))
Locals=ifelse(tbl1$category=="Work cluster",tbl2$Locals[1], ifelse (tbl1$category=="Shopping cluster", tbl2$Locals[2], ifelse(tbl1$category=="Airport", tbl2$Locals[3], tbl2$Locals[4])))

现在，功能

resultHour = function(tbl1, tbl2, ForeOrLoca)
{
previousHour = rep (0, nrow(tbl1))
for (i in 2:nrow(tbl1))
{
 previousHour[i] = tbl1$hour[i-1]
}

### The conditional sum matching the category from tbl1
NewHourSum = ifelse(tbl1$category=="Work cluster",sum(with(tbl1, hour*I(category == "Work cluster"))), ifelse (tbl1$category=="Shopping cluster", sum(with(tbl1, hour*I(category == "Shopping cluster"))), ifelse(tbl1$category=="Airport", sum(with(tbl1, hour*I(category == "Airport"))), sum(with(tbl1, hour*I(category == "Residential area"))))))

##and finally, this

hour = as.vector(tbl1$hour)

footfallHour <- (hour/(newHourSum - previousHour)) * ForeOrLoca
newtbl <- cbind(hour, footfallHour)
return (newtbl)
}

这是我得到的输出：

> head(newtbl)
 hour footfallHour
[1,]    3    1337.7926
[2,]    3    1506.2762
[3,]    3   12631.9264
[4,]    4    1785.2162
[5,]    3     441.7132
[6,]    3    1506.2762

使用该功能：

TheResultIWant = resultHour (tbl1,tbl2)

Answer 2

对于你的新问题。

如果您将数据框切割成仅包含一个类别的数据框，则可以使用此功能：

new_function_hour_result = function (tbl1_categ, vec_categ, prevHour_Categ, sumHour_Categ)
hour = as.vector(tbl1_categ$hour)

footfallHour <- (hour/(sumHour_Categ- previousHour)) * vec_categ
newtbl <- cbind(hour, footfallHour)
return (newtbl)
}

使用tbl1_categ给定类别的数据框，vec_categ您的外国人或给定类别的本地数据，prevHour_Categ给定类别的previousHour，最后{{1} }给定类别的sumHour。

要让您的矢量与df相同，请将它们与以下内容进行比较：

例如，本地/机场类别中sumHour_Categ 的：

vec_categ

外国人和机场类别：locals_airport = rep(category[3,3], nrow = nrow(tbl1_airport))

这将重复foreig_airport = rep(category[3,2], nrow = nrow(tbl1_airport))，category[3,2]次中包含的值。
本地人和工作集的
：nrow(tbl1_airport)

以及每个类别的每个向量（例如locals_workcluster = rep(category[1,3], nrow = nrow(tbl1_workcluster))，prevHour_Categ，sumHour_Categ）等等

for循环：R中不是同等大小的数据帧

2 个答案: