按sqldf中的日期范围过滤

时间:2017-12-21 21:45:52

标签: r sqldf

我正在尝试使用sqldf根据日期范围过滤数据框,如下面的示例代码所示。我有类似下面的示例数据的数据。 sqldf返回的datedf数据帧没有记录。在该日期范围内的SHV数据框中有记录,任何人都可以看到我做错了什么,让我知道如何按sqldf中的日期范围进行过滤。对我而言,日期总是很棘手。

Code:
datedf<-sqldf("select field1                            
            ,fieldDate
                            from SHV
                            where fieldDate between '2004-01-01' and '2005-01-01'
                            ")


Data:

dput(SHV[1:50,c("field1","fieldDate")])
structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
"fieldDate"), row.names = c(NA, 50L), class = "data.frame")

3 个答案:

答案 0 :(得分:0)

在此数据样本中,您没有该日期范围内的记录:

SHV[SHV$fieldDate >= "2010-01-01" & SHV$fieldDate < "2011-01-01",]
  field1 fieldDate
NA        NA      <NA>
NA.1      NA      <NA>
NA.2      NA      <NA>
NA.3      NA      <NA>
NA.4      NA      <NA>
NA.5      NA      <NA>
NA.6      NA      <NA>
NA.7      NA      <NA>
NA.8      NA      <NA>
NA.9      NA      <NA>
NA.10     NA      <NA>
NA.11     NA      <NA>
NA.12     NA      <NA>

答案 1 :(得分:0)

根据sqldf() documentation,需要将日期格式化为数字值,以便将它们作为日期处理。生成SQL查询时,可以使用sprintf()完成此操作。

SHV <- structure(list(field1 = c(1378L, 1653L, 1882L, 2400L, 
                          2305L, 2051L, 2051L, 2051L, 1796L, 2054L, 2568L, 1290L, 1804L, 
                          1804L, 3855L, 1297L, 2321L, 2321L, 2321L, 2071L, 2071L, 2074L, 
                          2588L, 1567L, 1317L, 1317L, 808L, 808L, 1321L, 2350L, 1586L, 
                          2613L, 1590L, 2614L, 2107L, 1340L, 1085L, 1085L, 2365L, 1344L, 
                          1601L, 1858L, 1603L, 1603L, 1860L, 2376L, 1355L, 1867L, 2382L, 
                          1872L), fieldDate = structure(c(12551, NA, NA, 14057, 15337, 
                                                              12919, 13336, 10325, 14984, 15643, 12864, 11242, 10749, 11207, 
                                                              10602, NA, 12646, 15649, NA, NA, NA, NA, NA, 17015, 13938, NA, 
                                                          16693, NA, NA, 12634, 12614, 10689, 12755, 10844, 11375, 4899, 
                                                          17298, 10905, 11450, NA, 10330, 15429, 12634, 10504, 12625, 11081, 
                                                          10939, NA, 12934, 11176), class = "Date")), .Names = c("field1", 
                                                                                                                 "fieldDate"), row.names = c(NA, 50L), class = "data.frame")

library(sqldf)
sqlStmt <- paste("select field1, fieldDate from SHV",
                 "where fieldDate between ",
                 sprintf("%d and %d",as.Date('2004-01-01','%Y-%m-%d'),
                     as.Date('2005-01-01','%Y-%m-%d')))
datedf<-sqldf(sqlStmt)
datedf

> datedf
  field1  fieldDate
1   1378 2004-05-13
2   2321 2004-08-16
3   2350 2004-08-04
4   1586 2004-07-15
5   1590 2004-12-03
6   1603 2004-08-04
7   1860 2004-07-26
> 

sprintf()语句将日期转换为数字值,这可确保SQL中的between运算符正常工作。

> sqlStmt
[1] "select field1, fieldDate from SHV where fieldDate between  12418 and 12784"
>

答案 2 :(得分:-1)

根据this article,在执行sqldf之前,应将日期字段转换为字符。

  

在将任何日期传递给SQLdf之前,我们需要先将它们转换为字符串。否则,SQLdf会尝试将它们视为数字 - 这会引起很多心痛。

     

...

     

相反,我们应该将DateCreated列转换为字符串而不是日期。然后,SQL实际上会将它从字符串转换为日期。

     

困惑?当我想要自己解决这个问题时想象一下。

所以你的代码可能是:

SHV$fieldDate <- as.character(SHV$fieldDate)

datedf <- sqldf("
  SELECT
    field1,
    fieldDate
  FROM SHV
  WHERE fieldDate between '2004-01-01' and '2005-01-01'
  --WHERE '2004-01-01' <= fieldDate --and fieldDate <= '2005-01-01'
  ORDER BY fieldDate
")

# Both should equal 7.  Verify that null rows are handled as desired.
nrow(datedf)
sum(as.Date('2004-01-01') <= SHV$fieldDate & SHV$fieldDate <= as.Date('2005-01-01'), na.rm=T)

我希望它能更多地解释何时将具有日期的变量转换为实际日期。如果您正在查看更多内容,@ g-grothendieck的SO response采用不同的方法并将sqldf查询中的数据类型等同。