Question

我有一个数据帧，其中包含m行和n列，所有值均标准化为0到1之间。

我希望每个点都定义一个n维立方体（最好是该点位于该立方体的中心，具体取决于每个轴上的值是否在中心），且每边等于0.2，并计算此多维数据集中有多少数据点。

例如：

df <- structure(list(x1 = c(0, 0.01, 0.05, 0.07, 0.1, 0.11, 0.16, 0.18, 
0.2, 0.25, 0.5), x2 = c(0.05, 0.3, 0.1, 0.17, 0.38, 0.01, 0.04, 
0.05, 0.11, 0.21, 0.26), x3 = c(0.4, 0.07, 0.09, 0.1, 0.23, 0.4, 
0.2, 0.11, 0.01, 0.34, 0.22)), row.names = c(NA, -11L), class = c("tbl_df", 
"tbl", "data.frame"))

第一个点不能是多维数据集的中心，因为其x和y的值太接近0。它定义的多维数据集由约束给出：

x1 >= 0 and x1 <= 0.2
x2 >= 0 and x2 <= 0.2
x3 >= 0.3 and x3 <= 0.5

因此，第一个多维数据集仅包含（0，0.05，0.4）和（0.11，0.01，0.4）点。

第二点定义了多维数据集：

x1 >= 0 and x1 <= 0.2
x2 >= 0.2 and x2 <= 0.4
x3 >= 0 and x3 <= 0.2

并且仅包含自身。

现在，我希望能够对任意n和m（请使用base或dplyr）进行有效的过滤。

有什么想法吗？

Answer 1

这将查看点与其立方体中心之间的距离。最大距离（在任何维度上）小于或等于0.1的任何点都将在该立方体内。

lower_edge = 0.5*((df - 0.1) + abs(df - 0.1))
lower_edge = 0.5*((lower_edge + 0.8) - abs(lower_edge - 0.8))
upper_edge = lower_edge + 0.2
cube_center = 0.5*(lower_edge + upper_edge)
m = NROW(df)
n = NCOL(df)
dists = as.matrix(dist(rbind(df, cube_center), method = "maximum"))[(m+1):(2*m), 1:m]
apply(dists, 1, function(x) sum(x <= 0.1))

（我假设您不希望任何多维数据集的点都位于[0,1] ^ n之外）

对于每个数据框行，找到在一定范围内的点

1 个答案: