按组剪切数据并创建频率表

时间:2015-04-30 01:03:49

标签: r

我有一个由地点和小时组成的数据框df

### dummy data
set.seed(1)
location <- c("loc1", "loc2", "loc3")
locations <- sample(location, size=50, replace=TRUE)
hours <- runif(50, min=0, max=20)

df <- data.frame(locations, hours)

我可以将数据切割成小时块并创建每个块的频率表

### cut data and create frequency table
c <- cut(df$hours, breaks=seq(0,20, by=1), include.lowest=TRUE)
t <- data.frame(table(c))
head(t)

      c Freq
1 [0,1]    0
2 (1,2]    4
3 (2,3]    2
4 (3,4]    0
5 (4,5]    4
6 (5,6]    2

但是,我无法首先按位置对数据进行分组。

如何使用locations变量对数据进行分组以提供类似

的输出
  location    c Freq
1   loc1  [0,1]    x1
2   loc1  (1,2]    x2
3   loc1  (2,3]    x3
4   loc1  (3,4]    x4
5   loc1  (4,5]    x5
6   loc1  (5,6]    x6
    ...
    loc2  [0,1]    y1
    loc2  (1,2]    y2
    ...

2 个答案:

答案 0 :(得分:2)

你可以尝试:

library(dplyr)
df %>% 
  mutate(hours = cut(hours, breaks=seq(0,20, by=1), include.lowest=TRUE)) %>%
  table() %>% data.frame() %>% arrange(locations, hours)

给出了:

#  locations hours Freq
#1      loc1 [0,1]    0
#2      loc1 (1,2]    1
#3      loc1 (2,3]    1
#4      loc1 (3,4]    0
#5      loc1 (4,5]    0
#6      loc1 (5,6]    1

答案 1 :(得分:1)

t <- data.frame(table(df$locations, c))
head(t[order(t$Var1), ])


   Var1     c Freq
1  loc1 [0,1]    0
4  loc1 (1,2]    1
7  loc1 (2,3]    1
10 loc1 (3,4]    0
13 loc1 (4,5]    0
16 loc1 (5,6]    1
首先是

cbind。如果您打算稍后处理数据,可能会更安全。