Question

我从MySQL数据库中获取了相当多的数据。它大约150mb。

enter image description here

然后我绘制了一些字段：

> qplot(myValues$average_submitted_chrg_amt, myValues$average_Medicare_payment_amt, data=myValues, color=nppes_provider_country,xlim=c(0,10000),ylim=c(0,4000),alpha=0.01)

为了感觉很酷，我要包括图表：

enter image description here

我想从SQL QUERY中随机抽取一行来重新编程。

有没有办法绘制myValues的子集？

Answer 1

如果你想从MySQL查询中获得一个随机子集，这里有两种方法。获得大约10％的样本：

select t.*
from (<your query here>) t
where rand() < 0.1;

要获得正好n行的随机样本，请执行以下操作：

select t.*
from (<your query here>) t
order by rand()
limit <n>;

第一种方法更快。

Answer 2

您可以使用sample获取要包含在子集中的行，并使用[从数据中对这些行进行子集化/提取。

这将从1到10抽样5个数字，无需替换

sample(10, 5)
#[1]  5  7  8  3 10

如果我们再次采样，我们可能会得到一个不同的样本

sample(10, 5)
#[1] 10  2  6  1  9

为了使采样可重复，我们可以设置种子（参见?set.seed）

set.seed(1) ; sample(10, 5)
# [1] 3 4 5 7 2
set.seed(1) ; sample(10, 5)
# [1] 3 4 5 7 2

您的情节 - 使用示例mtcars数据集。您可以使用sample对行进行抽样

library(ggplot2)

data(mtcars)

set.seed(1)
qplot(mpg, wt, data=mtcars[sample(nrow(mtcars), 20), ], geom="point")

mtcars[sample(nrow(mtcars), 20), ]从数据集中抽取20行

获取数据子集到qplot

2 个答案: