我想构建一个hexbin图,其中每个bin都是“落入此bin的class 1和class2之间的比例”(是否记录日志)。
x <- rnorm(10000)
y <- rnorm(10000)
h <- hexbin(x,y)
plot(h)
l <- as.factor(c( rep(1,2000), rep(2,8000) ))
有关如何实施此建议的任何建议?有没有办法根据bin统计信息向每个bin引入函数?
答案 0 :(得分:3)
@ cryo111的回答是最重要的因素 - IDs = TRUE
。在那之后,只需要弄清楚你想要用Inf
做什么,以及你需要多少来缩放比率来得到可以产生漂亮情节的整数。
library(hexbin)
library(data.table)
set.seed(1)
x = rnorm(10000)
y = rnorm(10000)
h = hexbin(x, y, IDs = TRUE)
# put all the relevant data in a data.table
dt = data.table(x, y, l = c(1,1,1,2), cID = h@cID)
# group by cID and calculate whatever statistic you like
# in this case, ratio of 1's to 2's,
# and then Inf's are set to be equal to the largest ratio
dt[, list(ratio = sum(l == 1)/sum(l == 2)), keyby = cID][,
ratio := ifelse(ratio == Inf, max(ratio[is.finite(ratio)]), ratio)][,
# scale up (I chose a scaling manually to get a prettier graph)
# and convert to integer and change h
as.integer(ratio*10)] -> h@count
plot(h)
答案 1 :(得分:1)
您可以通过
确定每个垃圾箱中的1级和2级积分数library(hexbin)
library(plyr)
x=rnorm(10000)
y=rnorm(10000)
#generate hexbin object with IDs=TRUE
#the object includes then a slot with a vector cID
#cID maps point (x[i],y[i]) to cell number cID[i]
HexObj=hexbin(x,y,IDs = TRUE)
#find count statistics for first 2000 points (class 1) and the rest (class 2)
CountDF=merge(count(HexObj@cID[1:2000]),
count(HexObj@cID[2001:length(x)]),
by="x",
all=TRUE
)
#replace NAs by 0
CountDF[is.na(CountDF)]=0
#check if all points are included
sum(CountDF$freq.x)+sum(CountDF$freq.y)
但打印它们是另一回事。例如,如果一个箱子中没有2级点怎么办?那时没有定义分数。
另外,据我所知hexbin
只是一个二维直方图。因此,它计算落入给定箱柜的点数。我认为它不能像你的情况那样处理非整数数据。