如何使用'段计算区域内的数据点数?

时间:2017-05-23 21:58:54

标签: r csv plot

有没有办法找出R上指定区域内有多少数据点?例如,这是我的代码:

data = read.csv("data.csv")
plot(data$X, data$Y, ylab = "Y", xlab = "X", pch = 1, col = unclass(data$classes), cex = 0.5, cex.axis = 0.5, main = "Y vs X Plot")
par(fig = c(0, 1, 0, 1), oma = c(0, 0, 0, 0), mar = c(0, 0, 0, 0), new = TRUE)
segments(-0.02, 0.36, 0.15, 0.36, col = c("purple"), lty = 1, lwd = 1)
segments(0.15, 0.35, 0.15, 0.9, col = c("purple"), lty = 1, lwd = 1)

这给了我这个情节:

Plot of Y vs X

我想知道有多少个红色圆圈落在左边的矩形区域(带有紫色边框的区域)或放在它的边界上。有没有办法在R上做到这一点?

2 个答案:

答案 0 :(得分:1)

不一定使用segments()本身,但这是使用数字本身的方法。

我没有您的数据,所以这里有一些假数据:

x <- rnorm(100)
y <- rnorm(100)
class <- sample(1:2, 100, replace=T)
plot(x,y, col=class)

# region we're interested in
xmin <- 0
xmax <- 1
ymin <- 0
ymax <- 1
rect(xmin, ymin, xmax, ymax)  
# rect was easier for what I wanted to show, but you can use numbers the same way

包装盒内有多少个点?

sum(x>xmin & x<xmax & y>ymin & y<ymax & class==2)
# [1] 11
# your results will vary

盒子边框上有多少个点?

sum(((x>xmin & x<xmax & y==ymin) | (x>xmin & x<xmax & y==ymax) | (y>ymin & y<ymax & x==xmin) | (y>ymin & y<ymax & x==xmax)) & class==2)
# [1] 0
# again, your results will vary

答案 1 :(得分:0)

一种方法是操纵现有数据框中的列。

library(ggplot2)
library(dplyr)
library(reshape2)
x <- runif(100, 0, 1)
y <- runif(100, 0, 1)
class <- round(runif(100, 0, 1))
df <- data.frame(cbind(x, y, class)) ## plot - insanity check 
ggplot(df, aes(x, y , color = as.character(class))) + geom_point()

## here x limits from 0 to 0.15 and y limits from 0 to 0.36

df <- df %>% mutate(div = ifelse(x < 0.15 & y < 0.36, 'Y', 'N'))
df %>% group_by(div, class) %>% summarise(cc =n()) %>% dcast(., div ~ class)