Question

我正在尝试创建一个汇总表，告诉我在自治市镇内使用自行车。其公式为

（自行车租用特别是自治市镇的次数）/（该自治市镇的租金总数）。

最终输出应该是这样的。

BikeId   Borough       Pct
    1     K&C          0.02
    1     Hammersmith  0.45
    7     K&C          0.32

为了达到这个目的，我试图实现如下功能：

smplData <- function(df) {
#initialize an empty dataframe
summDf <- data.frame(BikeId = character(), Borough = character(), Pct = 
               double())  

#create a vector of unique borough names
boro <- unique(df[,"Start.Borough"])
 for (i in 1:length(boro)){
     #looping through each borough and create a freq table
     bkCntBor<- table(df[df$Start.Borough==boro[i],"Bike.Id"])
     #total number of rentals in a particular borough
     borCnt <- nrow(df[df$Start.Borough==boro[i],])
    for (j in 1:length(bkCntBor)){
        #looping thru each bike for the ith borough and calculate ratio of jth bike
        bkPct <- as.vector(bkCntBor[j])/borCnt
        #temp dataframe to store a single row corresponding to bike, boro and ratio
        dfTmp <- data.frame(BikeId = names(bkCntBor[j]), Borough = boro[i], 
        Pct = bkPct)
       #append to summary table
       summDf <<- rbind(summDf, dfTmp)
  }

 }
}

df数据集的头部如下

>head(df)
Bike.Id Start.Borough Rental.Id
      1           K&C  61349872
      1           K&C  61361611
      1   Royal Parks  61362295
      1           K&C  61364627
      1           K&C  61367817
      1           H&F  61368333

当我在summDf中插入一条记录后运行该函数时，我得到以下错误

data.frame中的错误（BikeId = names（bkCntBor [j]），Borough = boro [i]，Pct = bkPct）：参数意味着不同的行数：0,1

我可以通过为i和j一次传递一个值来在控制台中运行功能代码。但是当我将它作为一个函数运行时，我得到了上面提到的错误。你们提供的任何帮助都会令人惊叹以下是相同的一些示例数据。

Bike.Id    Start.Borough
1            K&C      
1            K&C    
1            K&C    
7            K&C  
7            K&C  
1            Hammersmith
1            Hammersmith 
7            Hammersmith 
9            Hammersmith
9            Westminster

Answer 1

这是使用dplyr

的选项

library(dplyr)
dd %>% 
  group_by(Start.Borough, Bike.Id) %>% 
  summarize(n=n()) %>%
  mutate(pct = n / sum(n)) %>%
  select(-n)

首先我们使用group_by()查找自治市镇/自行车组合的数量。然后我们改变这些记录，将每个行政区/自行车计数除以自治市镇所有自行车的总和。

  Start.Borough Bike.Id  prop
         <fctr>   <int> <dbl>
1   Hammersmith       1  0.50
2   Hammersmith       7  0.25
3   Hammersmith       9  0.25
4           K&C       1  0.60
5           K&C       7  0.40
6   Westminster       9  1.00

带有示例输入

dd <- data.frame(Bike.Id = c(1, 1, 1, 7, 7, 1, 1, 7, 9, 9), 
    Start.Borough = c("K&C", "K&C", "K&C", "K&C", "K&C", "Hammersmith", 
    "Hammersmith", "Hammersmith", "Hammersmith", "Westminster"))

R Dataframe：错误参数意味着行数不同

1 个答案: