Question

plot(USArrests$Murder, USArrests$UrbanPop, 
     xlab="murder", ylab="% urban population", pch=20, col="grey",
     ylim=c(20, 100), xlim=c(0, 20))
text(USArrests$Murder, USArrests$UrbanPop, labels=rownames(USArrests), 
     cex=0.7, pos=3)

我尝试了所有操作，使用cex减小了字体大小，更改了位置，更改了ylim和xlim以适应其大小，还尝试了更改了页边距，但这并没有真正帮助我，所以我摆脱了它们。在这一点上，我不知道如何使用基本R工具来做到这一点。我确实知道ggplot方法，这很容易。但是我想知道是否可以使用基本的plot()，text()代码执行相同的任务。

Answer 1

要查找距离太近的邻居，可以对数据运行kmeans()聚类分析。不过，这确实是一个骇客！

首先，对数据进行子集化。

dat <- USArrests[c("Murder", "UrbanPop")]

设置种子。一起玩。不同的种子=>不同的结果。

set.seed(42)

使用kmeans()分析集群，选项centers分配集群数量，然后尝试。

dat$cl <- kmeans(dat, centers=10, nstart=5)$cluster

现在拆分数据并分配变更的pos号，以便稍后在text()命令中定位。

l <- split(dat, dat$cl)
l <- lapply(l, function(x) within(x, {
  if (nrow(x) == 1)
    pos <- 2  # for those with just one observation in cluster
  else
    pos <- as.numeric(as.character(factor((1:nrow(x)) %% 2, labels=c(2, 4))))
}))

组装。

dat <- do.call(rbind, unname(l))

现在将图绘制成具有较高分辨率的png，我选择了800x800。

png("plot.png", 800, 800, "px")
plot(dat$Murder, dat$UrbanPop, xlab="murder", ylab="% urban population", 
     pch=20, col="grey",  ylim=c(20, 100), xlim=c(0, 20))
# the sapply assigns the text position according to `pos` column
sapply(c(4, 2), function(x) 
  with(dat[dat$pos == x, ], 
       text(Murder, UrbanPop, labels=rownames(dat[dat$pos == x, ]),
            cex=0.7, pos=x)))
dev.off()

哪个给我：

我确定您可以进一步优化它。

如何解决重叠问题

1 个答案: