Question

我有一个分配给名为'temps'的变量的数据集，其中包含'date'，'temperature'，'country'列。
我想做这样的事情，我可以用SQL做什么

SELECT * FROM temps WHERE country != 'mycountry'

如何在R中进行类似的选择？

Answer 1

我们可以在base R

中使用类似的语法

temps[temps$country != "mycountry",]

基准

set.seed(24)
temps1 <- data.frame(country = sample(LETTERS, 1e7, replace=TRUE),
                  val = rnorm(1e7))
system.time(temps1[!temps1$country %in% "A",])
#  user  system elapsed 
#   0.92    0.11    1.04 
system.time(temps1[temps1$country != "A",])
#   user  system elapsed 
#   0.70    0.17    0.88

如果我们使用包解决方案

library(sqldf)
system.time(sqldf("SELECT * FROM temps1 WHERE country != 'A'"))
#   user  system elapsed 
# 12.78    0.37   13.15 

library(data.table)
system.time(setDT(temps1, key = 'country')[!("A")])
#   user  system elapsed 
#  0.62    0.19    0.37

Answer 2

这应该这样做。

_site/chapters/chapter02.html

Answer 3

以下是sqldf和base R方法，根据下面注释中显示的输入显示源和样本输出。

1）sqldf

library(sqldf)
sqldf("SELECT * FROM temps WHERE country != 'mycountry'")
##   country value
## 1   other     2

2）基础R

subset(temps, country != "mycountry")
##   country value
## 2   other     2

注意：上面显示的测试数据如下所示。下次请求在问题中提供此类可重复的样本数据。

# test data
temps <- data.frame(country = c("mycountry", "other"), value = 1:2)

与R

3 个答案:

基准