试图了解更多R ...想要找到一个干净且易于遵循的方式接受订单DF:
customerID Timestamp freq lat
1 1 2017-01-01 2 31.0
2 2 2017-01-01 3 90.5
3 3 2017-01-01 1 NaN
4 4 2017-01-01 1 NaN
5 1 2017-02-01 2 31.0
6 2 2017-03-01 3 90.5
7 2 2017-07-01 3 90.5
根据lat和freq的一系列存储桶创建一个带有计数的网格。桶:
e.g。
lat
freq 61+ 31-60 0-30
5+ 0 0 0
2-4 3 2 0
1 0 0 2
Dput:
> dput(orders)
structure(list(customerID = c(1L, 2L, 3L, 4L, 1L, 2L, 2L), Timestamp =
structure(c(17167,
17167, 17167, 17167, 17198, 17226, 17348), class = "Date"), freq = c(2L,
3L, 1L, 1L, 2L, 3L, 3L), lat = c(31, 90.5, NaN, NaN, 31, 90.5,
90.5)), .Names = c("customerID", "Timestamp", "freq", "lat"), row.names =
c(NA, 7L), class = "data.frame")
更新
一直在努力......我使用cut
来做分组......不确定是否是最好的路线。但是不知道如何做网格。
e.g。
orders$freq_range <- cut(orders$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))
答案 0 :(得分:0)
您可以使用table
获取交叉表输出:
df
customerID Timestamp freq lat
1 2017-01-01 2 31.0
2 2017-01-01 3 90.5
3 2017-01-01 1 NA
4 2017-01-01 1 NA
1 2017-02-01 2 31.0
2 2017-03-01 3 90.5
2 2017-07-01 3 90.5
2 2017-07-01 5 90.5
3 2017-07-01 6 100.5
df$a<-cut(df$freq, breaks=c(0,1,4,100000), labels=c("1","2-4","5+"))
df$b <- cut(df$lat, breaks=c(0,30,60,100000), labels=c("0-30","31-60","60+"))
df
customerID Timestamp freq lat a b
1 2017-01-01 2 31.0 2-4 31-60
2 2017-01-01 3 90.5 2-4 60+
3 2017-01-01 1 NA 1 <NA>
4 2017-01-01 1 NA 1 <NA>
1 2017-02-01 2 31.0 2-4 31-60
2 2017-03-01 3 90.5 2-4 60+
2 2017-07-01 3 90.5 2-4 60+
2 2017-07-01 5 90.5 5+ 60+
3 2017-07-01 6 100.5 5+ 60+
table(df$a, df$b)
0-30 31-60 60+
1 0 0 0
2-4 0 2 3
5+ 0 0 2