范围桶值并使用R创建网格

时间:2017-11-18 00:58:50

标签: r

试图了解更多R ...想要找到一个干净且易于遵循的方式接受订单DF:

customerID Timestamp freq lat 1 1 2017-01-01 2 31.0 2 2 2017-01-01 3 90.5 3 3 2017-01-01 1 NaN 4 4 2017-01-01 1 NaN 5 1 2017-02-01 2 31.0 6 2 2017-03-01 3 90.5 7 2 2017-07-01 3 90.5

根据lat和freq的一系列存储桶创建一个带有计数的网格。桶:

  • freq(1,2-4,5 +)
  • lat(0-30,31-60,61 +)

e.g。 lat freq 61+ 31-60 0-30 5+ 0 0 0 2-4 3 2 0 1 0 0 2

Dput:

> dput(orders)
structure(list(customerID = c(1L, 2L, 3L, 4L, 1L, 2L, 2L), Timestamp =             
structure(c(17167, 
17167, 17167, 17167, 17198, 17226, 17348), class = "Date"), freq = c(2L, 
3L, 1L, 1L, 2L, 3L, 3L), lat = c(31, 90.5, NaN, NaN, 31, 90.5, 
90.5)), .Names = c("customerID", "Timestamp", "freq", "lat"), row.names = 
c(NA, 7L), class = "data.frame")

更新

一直在努力......我使用cut来做分组......不确定是否是最好的路线。但是不知道如何做网格。

e.g。

orders$freq_range <- cut(orders$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))

1 个答案:

答案 0 :(得分:0)

您可以使用table获取交叉表输出:

df

customerID  Timestamp  freq   lat
1           2017-01-01  2    31.0
2           2017-01-01  3    90.5
3           2017-01-01  1      NA
4           2017-01-01  1      NA
1           2017-02-01  2    31.0
2           2017-03-01  3    90.5
2           2017-07-01  3    90.5
2           2017-07-01  5    90.5
3           2017-07-01  6   100.5

df$a<-cut(df$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))

df$b <- cut(df$lat, breaks=c(0,30,60,100000), labels=c("0-30","31-60","60+"))

df

customerID  Timestamp freq   lat   a     b
 1          2017-01-01    2  31.0 2-4 31-60
 2          2017-01-01    3  90.5 2-4   60+
 3          2017-01-01    1    NA   1  <NA>
 4          2017-01-01    1    NA   1  <NA>
 1          2017-02-01    2  31.0 2-4 31-60
 2          2017-03-01    3  90.5 2-4   60+
 2          2017-07-01    3  90.5 2-4   60+
 2          2017-07-01    5  90.5  5+   60+
 3          2017-07-01    6 100.5  5+   60+

table(df$a, df$b)

       0-30 31-60 60+
  1      0     0   0
  2-4    0     2   3
  5+     0     0   2