Question

我有以下数据框df：

LeftOrRight SpeedCategory   NumThruLanes
R           25to45          3             
L           45to62          2           
R           Gt62            1

我希望通过SpeedCategory对其进行分组，并循环浏览其他列以获取每个速度类别中每个唯一代码的频率 - 如下所示：

                 25to45 45to62 Gt62
LeftOrRight    L      0      1    0
               R      1      0    1
NumThruLanes   1      0      0    1
               2      0      1    0
               3      1      0    0

我能找到的最接近的是：

for (col in df){
tbl <- table(col, df$SpeedCategory)
print(tbl)
}

打印出以下内容（第一个SpeedCategory，然后是NumThruLanes）：

col   25to45 45to62 Gt62
  L        0      1    0
  R        1      0    1

col   25to45 45to62 Gt62
  1        0      0    1
  2        0      1    0
  3        1      0    0

我很确定我可以使用aggregate()或dplyr中的group_by完成我的目标，但我是R的新手，无法弄清楚语法。在pandas我会使用MultiIndex，但我不知道R等价物是什么，因此很难谷歌。

我想尝试一次性完成所有操作，或者使用循环播放，因为我有十几列要通过。

Answer 1

tables包使得以非常具体的方式格式化表格变得容易。语法需要一些时间来习惯，但对于这个问题，它非常简单：

exd <- read.table(text = "LeftOrRight SpeedCategory   NumThruLanes
R           25to45          3             
L           45to62          2           
R           Gt62            1", header = TRUE)       

## to get counts by default we need everything to be categorical
exd$SpeedCategory <- factor(exd$SpeedCategory)

library(tables)
tabular(LeftOrRight + NumThruLanes ~ SpeedCategory, data = exd)

##                SpeedCategory            
##                25to45        45to62 Gt62
## LeftOrRight  L 0             1      0   
##              R 1             0      1   
## NumThruLanes 1 0             0      1   
##              2 0             1      0   
##              3 1             0      0

如果你有很多列要迭代，你可以用编程方式构造公式，例如，

tabular(as.formula(paste(paste(names(exd)[-2], collapse = " + "),
                         names(exd)[2], sep = " ~ ")),
        data = exd)

作为奖励，有html和latex方法，可以很容易地将您的表格标记为包含在文章或报告中。

Answer 2

您可以使用lapply()而不是for循环一次性完成所有操作：

tab_list <- lapply(df[, -2], function(col) table(col, df$SpeedCategory))
tab_list
## $LeftOrRight
##    
## col 25to45 45to62 Gt62
##   L      0      1    0
##   R      1      0    1
## 
## $NumThruLanes
##    
## col 25to45 45to62 Gt62
##   1      0      0    1
##   2      0      1    0
##   3      1      0    0

然后，您可以使用rbind()和do.call()

将表格合并为一个

do.call(rbind, tab_list)
##   25to45 45to62 Gt62
## L      0      1    0
## R      1      0    1
## 1      0      0    1
## 2      0      1    0
## 3      1      0    0

可以在输出表中获取一个列，该列指示原始数据框中的列名。要实现这一点，您需要lapply()对列名称进行更复杂的操作：

tab_list <- lapply(names(df)[-2], function(col) {
  tab <- table(df[, col], df[, "SpeedCategory"])
  name_col <- c(col, rep("", nrow(tab) - 1))
  mat <- cbind(name_col, rownames(tab), tab)
  as.data.frame(mat)
  })
do.call(rbind, tab_list)
##       name_col V2 25to45 45to62 Gt62
## L  LeftOrRight  L      0      1    0
## R               R      1      0    1
## 1 NumThruLanes  1      0      0    1
## 2               2      0      1    0
## 3               3      1      0    0

Answer 3

这不会一次性完成所有事情，但可能会让你朝着正确的方向前进

library(reshape2)

dcast(df, LeftOrRight ~ SpeedCategory, fun.aggregate = length)
dcast(df, NumThruLanes ~ SpeedCategory, fun.aggregate = length)

Answer 4

使用reshape2包中的dcast，您可以执行以下操作：

library("reshape2")

DF=read.table(text="LeftOrRight SpeedCategory   NumThruLanes
R           25to45          3             
L           45to62          2           
R           Gt62            1",header=TRUE,stringsAsFactors=FALSE)

LR_Stat = dcast(DF,LeftOrRight ~ SpeedCategory,length,fill=0)
LR_Stat
#  LeftOrRight 25to45 45to62 Gt62
#1           L      0      1    0
#2           R      1      0    1

Lanes_Stat = dcast(DF,NumThruLanes ~ SpeedCategory,length,fill=0)
Lanes_Stat
#  NumThruLanes 25to45 45to62 Gt62
#1            1      0      0    1
#2            2      0      1    0
#3            3      1      0    0

请注意，LR_Stat在预期输出

中的范围为45到62应该为1

r：按多列和计数分组

4 个答案: