Question

我正在尝试仅从我的数据集中的“日期”变量中的以下特定工作日“星期四”，“星期五”和“星期六”来汇总数据。

> head(tidyFile)
            Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00               0.326                 0.128  243.15              1.4
66638 2007-02-01 00:01:00               0.326                 0.130  243.32              1.4
66639 2007-02-01 00:02:00               0.324                 0.132  243.51              1.4
66640 2007-02-01 00:03:00               0.324                 0.134  243.90              1.4
66641 2007-02-01 00:04:00               0.322                 0.130  243.16              1.4
66642 2007-02-01 00:05:00               0.320                 0.126  242.29              1.4
      Sub_metering_1 Sub_metering_2 Sub_metering_3
66637              0              0              0
66638              0              0              0
66639              0              0              0
66640              0              0              0
66641              0              0              0
66642              0              0              0

我使用以下代码在我需要的日期范围之间进行了分组：

tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ]

但是我的子集方式可能有问题，因为当我在这个子集中调用“Thurs”，“Fri”和“Sat”时，我得到NA值，这可能不对。我是否应该与时俱进以确保我能够包含上述日期？

最后，我需要通过“周四”，“周五”和“周六”进一步对我的数据进行子集化，而我似乎无法做到这一点。我尝试了以下内容：

library(lubridate)
with(tidyFile[wday(tidyFile, label=T) == "Thurs" & "Fri" & "Sat"])

返回错误消息：

Error in wday(tidyFile, label = T) : unused argument (label = T)

更新

这些是我创建脚本所采取的步骤：

## STEP 1: Set working directory
setwd("/Users/usaid/datasciencecoursera/data/") 

## STEP 2: Create a new object 'newFile' and read .txt file into R
newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, stringsAsFactors=FALSE,  as.is=TRUE)  

## STEP 3: Create a new object 'newFile$Date' and format dates (into date class)
newFile$Date <- as.Date(newFile$Date, format = "%d/%m/%Y") 
newFile$Date <- strptime(newFile$Date, format = "%d/%m/%Y", tz = "")

## STEP 4: Create a new object 'tidyFile' and subset data based on date range provided in Project 1 instructions
tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ] 

## STEP 5: Subset data by "Thurs", "Fri", "Sat"
library(lubridate)
with(tidyFile, wday(Date, label = TRUE))
days <- with(tidyFile, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
tidyFile[days, ]

当我运行第5步时，我收到下面提到的错误消息。

Answer 1

这有助于甩尾吗？

## snippet of your data, not all columns
dat <- read.table(text = "            Date     Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00               0.326                 0.128  243.15              1.4
66638 2007-02-01 00:01:00               0.326                 0.130  243.32              1.4
66639 2007-02-01 00:02:00               0.324                 0.132  243.51              1.4
66640 2007-02-01 00:03:00               0.324                 0.134  243.90              1.4
66641 2007-02-01 00:04:00               0.322                 0.130  243.16              1.4
66642 2007-02-01 00:05:00               0.320                 0.126  242.29              1.4
", header = TRUE)

## Make Date an actual Date
dat <- transform(dat, Date = as.Date(Date))
## Load lubridate
require("lubridate")

让wday()返回Date

的星期几

with(dat, wday(Date, label = TRUE))

现在我们需要添加与您列出的选项的比较。这是使用%in%二元运算符完成的。 %in%的右侧需要一个匹配的向量，因此您需要将c("Thurs", "Fri", "Sat")放在%in%的右侧，如：

with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))

使用您显示的数据片段

> with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
[1] TRUE TRUE TRUE TRUE TRUE TRUE

要完成，你需要像

这样的东西

take <- with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
dat[take, ]

这是所有这些情况，但我在你的真实数据集中假设你不仅仅是这几条记录。

在R中按日期范围/天对子数据进行子集

1 个答案: