Question

我有1617个obs和202个变量的数据帧，其中包括变量State。有52个独立州。我想随机选择5个状态，但是在这5个状态下的所有条目或对5个特定状态下的所有条目进行采样。

我尝试使用此功能：

A <- subset(Iped, STABBR == c("PA", "DC", "MD", "DE", "VA"))

，但不会返回具有上述值的所有条目。它仅从大约230个条目中选择45个条目。

我希望能够子集化以包含5个状态并在每个状态下计数条目。

Answer 1

要获得5个随机状态，如果sample(unique(Iped$State), 5)是数据框的名称，请执行Iped。

您的最终子集将是A <- subset(Iped, STABBR %in% sample(unique(Iped$State), 5))

Answer 2

我可能无法完全理解您的问题，如果没有可复制的示例，问题将变得更加棘手。但是这是一个data.table解决方案，我认为您可以使用：

# load library
require(data.table)

# define data:
set.seed(1)
states <- data.table(a = 1:1000, State = sample(LETTERS, 1000, TRUE))

# filter those states in a random sample of 5 (obviously not replacing them!): that's what gets before the first comma. Then count them (that's the .N) by the name of each State (that's the by):
states[State %in% sample(unique(State), 5, FALSE), .N, by = State]

在x列下选择多行值（a，b，c）

2 个答案: