R编码:使用重复测量来引导数据集

时间:2016-07-19 19:18:34

标签: r bootstrapping

我的数据集看起来像(称之为data_xy

id X Y
1  5 10
1  6 11
1  4 8
2  3 9
2  3 12
3  4 10
...

来自总N个N的观察结果。每个id都有几行测量值。

我想用替换来引导id。 bootstrap id很可能包含重复项。

b_idx <- sample.int(N,N,T)

很可能

b_idx=c(1,1,3,4,4,4....)

然后如何用b_idx创建bootstrap示例?如果我做

data_xy[data_xy$id==b_idx,]

每个id(及其重复测量)将仅出现在我的bootstrap数据集中。我真正想要的是复制id=k b_idx {id}} {}} {} {} {{{我怎样才能做到这一点?

2 个答案:

答案 0 :(得分:0)

我使用grr包中的'matches'函数。

Indices <- unlist(matches(b.idx, data_xy$ID, list=TRUE))

b.data <- data_xy[Indices, ]

答案 1 :(得分:0)

您实际上并不需要直接使用ID;您可以只对行号进行采样,然后直接将data.frame索引为:

# List all VSIs in your account.
#
# Important manual pages:
# https://sldn.softlayer.com/reference/services/SoftLayer_Account
# https://sldn.softlayer.com/reference/datatypes/SoftLayer_Virtual_Guest
#
# @license <http://sldn.softlayer.com/article/License>
# @author SoftLayer Technologies, Inc. <sldn@softlayer.com>
require 'softlayer_api'
require 'pp'

# Your SoftLayer API username and key.
USERNAME = 'set me'
API_KEY = 'set me'

# Create a SoftLayer API client object
client = SoftLayer::Client.new(username: USERNAME, api_key: API_KEY)
account_service = client['SoftLayer_Account']

# We will retrieve the additional information for each VSI:
mask = 'mask[id,blockDevices[id,mountType,diskImage[capacity]]]'
begin
  # getVirtualGuests() will get all the VSIs that an account has.
  result = account_service.object_mask(mask).getVirtualGuests
  pp result
rescue StandardError => exception
  puts "Unable to  get the VSIs: #{exception}"
end

如果传入两次相同的整数,则会获得两次该行。以下是该原则的一个例子:

# How many rows in the data.frame?
n <- nrow(mtcars)

# Sample them
mtcars[sample(x = n, size =  n, replace = TRUE), ]

如果您还不知道,请务必查看the boot package,这会为您自动执行大量自举方案。