Question

我希望以每行特定的概率对表格的行进行采样。

我的桌子有大约5000万行，我希望抽样500,000（即1％）。这需要几个小时。您是否知道如何提高效率，例如使用一些C ++包（尽管func showActionSheet(_ changeAction: UIAlertAction) { let alertController = UIAlertController(title: "", message: "Here is my alert text".localize(), preferredStyle: .actionSheet) alertController.view.tintColor = StyleKit.goldenColor let attributedString = NSAttributedString(string: alertController.message!, attributes: [ NSForegroundColorAttributeName : StyleKit.whiteColor ]) alertController.setValue(attributedString, forKey: "attributedMessage") if let subview = alertController.view.subviews.first, let alertContentView = subview.subviews.first { for innerView in alertContentView.subviews { innerView.backgroundColor = StyleKit.popoverDefaultBackgroundColor } } let cancelAction = UIAlertAction(title: "Cancel".localize(), style: .cancel) { _ in self.doneButton.isEnabled = true } alertController.addAction(changeAction) alertController.addAction(cancelAction) self.present(alertController, animated: true) }和sample似乎已经用C语言编写）？

到目前为止我使用的命令：

谢谢！

Answer 1

嗯，这会快得多

ind <- sample.int(dim(myTable)[1], 500000, prob = prob_vector)
ind <- sort(ind)
myTableSample <- myTable[ind, ]

在排序之前，您正在进行完全随机访问。但是在排序之后，就 cpu cache 实用程序而言，它要好得多。

当然这还不是最快的。您可以在C中编写此行子集，并且（基于我之前的经验）比[?, ]快得多。

R：从具有行特定概率的大表中抽取行

1 个答案: