在R中按日期范围/天对子数据进行子集

时间:2014-06-03 21:05:20

标签: r date subset

我正在尝试仅从我的数据集中的“日期”变量中的以下特定工作日“星期四”,“星期五”和“星期六”来汇总数据。

> head(tidyFile)
            Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00               0.326                 0.128  243.15              1.4
66638 2007-02-01 00:01:00               0.326                 0.130  243.32              1.4
66639 2007-02-01 00:02:00               0.324                 0.132  243.51              1.4
66640 2007-02-01 00:03:00               0.324                 0.134  243.90              1.4
66641 2007-02-01 00:04:00               0.322                 0.130  243.16              1.4
66642 2007-02-01 00:05:00               0.320                 0.126  242.29              1.4
      Sub_metering_1 Sub_metering_2 Sub_metering_3
66637              0              0              0
66638              0              0              0
66639              0              0              0
66640              0              0              0
66641              0              0              0
66642              0              0              0

我使用以下代码在我需要的日期范围之间进行了分组:

tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ] 

但是我的子集方式可能有问题,因为当我在这个子集中调用“Thurs”,“Fri”和“Sat”时,我得到NA值,这可能不对。我是否应该与时俱进以确保我能够包含上述日期?

最后,我需要通过“周四”,“周五”和“周六”进一步对我的数据进行子集化,而我似乎无法做到这一点。我尝试了以下内容:

library(lubridate)
with(tidyFile[wday(tidyFile, label=T) == "Thurs" & "Fri" & "Sat"])

返回错误消息:

Error in wday(tidyFile, label = T) : unused argument (label = T)

更新

这些是我创建脚本所采取的步骤:

## STEP 1: Set working directory
setwd("/Users/usaid/datasciencecoursera/data/") 

## STEP 2: Create a new object 'newFile' and read .txt file into R
newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, stringsAsFactors=FALSE,  as.is=TRUE)  

## STEP 3: Create a new object 'newFile$Date' and format dates (into date class)
newFile$Date <- as.Date(newFile$Date, format = "%d/%m/%Y") 
newFile$Date <- strptime(newFile$Date, format = "%d/%m/%Y", tz = "")

## STEP 4: Create a new object 'tidyFile' and subset data based on date range provided in Project 1 instructions
tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ] 

## STEP 5: Subset data by "Thurs", "Fri", "Sat"
library(lubridate)
with(tidyFile, wday(Date, label = TRUE))
days <- with(tidyFile, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
tidyFile[days, ]

当我运行第5步时,我收到下面提到的错误消息。

1 个答案:

答案 0 :(得分:1)

这有助于甩尾吗?

## snippet of your data, not all columns
dat <- read.table(text = "            Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00               0.326                 0.128  243.15              1.4
66638 2007-02-01 00:01:00               0.326                 0.130  243.32              1.4
66639 2007-02-01 00:02:00               0.324                 0.132  243.51              1.4
66640 2007-02-01 00:03:00               0.324                 0.134  243.90              1.4
66641 2007-02-01 00:04:00               0.322                 0.130  243.16              1.4
66642 2007-02-01 00:05:00               0.320                 0.126  242.29              1.4
", header = TRUE)

## Make Date an actual Date
dat <- transform(dat, Date = as.Date(Date))
## Load lubridate
require("lubridate")

wday()返回Date

的星期几
with(dat, wday(Date, label = TRUE))

现在我们需要添加与您列出的选项的比较。这是使用%in%二元运算符完成的。 %in%的右侧需要一个匹配的向量,因此您需要将c("Thurs", "Fri", "Sat")放在%in%的右侧,如:

with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))

使用您显示的数据片段

> with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
[1] TRUE TRUE TRUE TRUE TRUE TRUE

要完成,你需要像

这样的东西
take <- with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
dat[take, ]

这是所有这些情况,但我在你的真实数据集中假设你不仅仅是这几条记录。