我正在尝试仅从我的数据集中的“日期”变量中的以下特定工作日“星期四”,“星期五”和“星期六”来汇总数据。
> head(tidyFile)
Date Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00 0.326 0.128 243.15 1.4
66638 2007-02-01 00:01:00 0.326 0.130 243.32 1.4
66639 2007-02-01 00:02:00 0.324 0.132 243.51 1.4
66640 2007-02-01 00:03:00 0.324 0.134 243.90 1.4
66641 2007-02-01 00:04:00 0.322 0.130 243.16 1.4
66642 2007-02-01 00:05:00 0.320 0.126 242.29 1.4
Sub_metering_1 Sub_metering_2 Sub_metering_3
66637 0 0 0
66638 0 0 0
66639 0 0 0
66640 0 0 0
66641 0 0 0
66642 0 0 0
我使用以下代码在我需要的日期范围之间进行了分组:
tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ]
但是我的子集方式可能有问题,因为当我在这个子集中调用“Thurs”,“Fri”和“Sat”时,我得到NA值,这可能不对。我是否应该与时俱进以确保我能够包含上述日期?
最后,我需要通过“周四”,“周五”和“周六”进一步对我的数据进行子集化,而我似乎无法做到这一点。我尝试了以下内容:
library(lubridate)
with(tidyFile[wday(tidyFile, label=T) == "Thurs" & "Fri" & "Sat"])
返回错误消息:
Error in wday(tidyFile, label = T) : unused argument (label = T)
更新
这些是我创建脚本所采取的步骤:
## STEP 1: Set working directory
setwd("/Users/usaid/datasciencecoursera/data/")
## STEP 2: Create a new object 'newFile' and read .txt file into R
newFile <- read.table("course_4_proj_1.txt", header=TRUE, sep=";", na.strings = "?", nrows= 1000000, stringsAsFactors=FALSE, as.is=TRUE)
## STEP 3: Create a new object 'newFile$Date' and format dates (into date class)
newFile$Date <- as.Date(newFile$Date, format = "%d/%m/%Y")
newFile$Date <- strptime(newFile$Date, format = "%d/%m/%Y", tz = "")
## STEP 4: Create a new object 'tidyFile' and subset data based on date range provided in Project 1 instructions
tidyFile <- newFile[newFile$Date >= "2007-02-01" & newFile$Date <= "2007-02-02", ]
## STEP 5: Subset data by "Thurs", "Fri", "Sat"
library(lubridate)
with(tidyFile, wday(Date, label = TRUE))
days <- with(tidyFile, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
tidyFile[days, ]
当我运行第5步时,我收到下面提到的错误消息。
答案 0 :(得分:1)
这有助于甩尾吗?
## snippet of your data, not all columns
dat <- read.table(text = " Date Time Global_active_power Global_reactive_power Voltage Global_intensity
66637 2007-02-01 00:00:00 0.326 0.128 243.15 1.4
66638 2007-02-01 00:01:00 0.326 0.130 243.32 1.4
66639 2007-02-01 00:02:00 0.324 0.132 243.51 1.4
66640 2007-02-01 00:03:00 0.324 0.134 243.90 1.4
66641 2007-02-01 00:04:00 0.322 0.130 243.16 1.4
66642 2007-02-01 00:05:00 0.320 0.126 242.29 1.4
", header = TRUE)
## Make Date an actual Date
dat <- transform(dat, Date = as.Date(Date))
## Load lubridate
require("lubridate")
让wday()
返回Date
with(dat, wday(Date, label = TRUE))
现在我们需要添加与您列出的选项的比较。这是使用%in%
二元运算符完成的。 %in%
的右侧需要一个匹配的向量,因此您需要将c("Thurs", "Fri", "Sat")
放在%in%
的右侧,如:
with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
使用您显示的数据片段
> with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
[1] TRUE TRUE TRUE TRUE TRUE TRUE
要完成,你需要像
这样的东西take <- with(dat, wday(Date, label = TRUE) %in% c("Thurs","Fri","Sat"))
dat[take, ]
这是所有这些情况,但我在你的真实数据集中假设你不仅仅是这几条记录。