我的数据集看起来像(称之为data_xy
)
id X Y
1 5 10
1 6 11
1 4 8
2 3 9
2 3 12
3 4 10
...
来自总N个N的观察结果。每个id都有几行测量值。
我想用替换来引导id。 bootstrap id很可能包含重复项。
b_idx <- sample.int(N,N,T)
很可能
b_idx=c(1,1,3,4,4,4....)
然后如何用b_idx
创建bootstrap示例?如果我做
data_xy[data_xy$id==b_idx,]
每个id
(及其重复测量)将仅出现在我的bootstrap数据集中。我真正想要的是复制id=k
b_idx
{id}} {}} {} {} {{{我怎样才能做到这一点?
答案 0 :(得分:0)
我使用grr包中的'matches'函数。
Indices <- unlist(matches(b.idx, data_xy$ID, list=TRUE))
b.data <- data_xy[Indices, ]
答案 1 :(得分:0)
您实际上并不需要直接使用ID;您可以只对行号进行采样,然后直接将data.frame索引为:
# List all VSIs in your account.
#
# Important manual pages:
# https://sldn.softlayer.com/reference/services/SoftLayer_Account
# https://sldn.softlayer.com/reference/datatypes/SoftLayer_Virtual_Guest
#
# @license <http://sldn.softlayer.com/article/License>
# @author SoftLayer Technologies, Inc. <sldn@softlayer.com>
require 'softlayer_api'
require 'pp'
# Your SoftLayer API username and key.
USERNAME = 'set me'
API_KEY = 'set me'
# Create a SoftLayer API client object
client = SoftLayer::Client.new(username: USERNAME, api_key: API_KEY)
account_service = client['SoftLayer_Account']
# We will retrieve the additional information for each VSI:
mask = 'mask[id,blockDevices[id,mountType,diskImage[capacity]]]'
begin
# getVirtualGuests() will get all the VSIs that an account has.
result = account_service.object_mask(mask).getVirtualGuests
pp result
rescue StandardError => exception
puts "Unable to get the VSIs: #{exception}"
end
如果传入两次相同的整数,则会获得两次该行。以下是该原则的一个例子:
# How many rows in the data.frame?
n <- nrow(mtcars)
# Sample them
mtcars[sample(x = n, size = n, replace = TRUE), ]
如果您还不知道,请务必查看the boot package,这会为您自动执行大量自举方案。