lubridate不喜欢子集化吗?

时间:2017-07-11 19:05:57

标签: r ggplot2 dplyr lubridate

此问题与我问earlier的问题有关。 我花了一些时间思考如何更清楚地传达我的问题,并为罗嗦的问题道歉。任何意见是极大的赞赏。

下面是我正在使用的数据集的一个重要子集的百行代码段。

    SPD_2015 <- structure(list(summarized.offense.description = c("ASSAULT", 
"THREATS", "CAR PROWL", "SHOPLIFTING", "MAIL THEFT", "THREATS", 
"DISTURBANCE", "STOLEN PROPERTY", "TRESPASS", "VEHICLE THEFT", 
"CAR PROWL", "THREATS", "STOLEN PROPERTY", "VEHICLE THEFT", "BURGLARY-SECURE PARKING-RES", 
"CAR PROWL", "THREATS", "BIKE THEFT", "BURGLARY", "ASSAULT", 
"STOLEN PROPERTY", "DISTURBANCE", "VEHICLE THEFT", "CAR PROWL", 
"OTHER PROPERTY", "ASSAULT", "PROPERTY DAMAGE", "BURGLARY-SECURE PARKING-RES", 
"ANIMAL COMPLAINT", "OTHER PROPERTY", "BURGLARY", "BURGLARY", 
"CAR PROWL", "SHOPLIFTING", "BURGLARY", "PROPERTY DAMAGE", "DISTURBANCE", 
"PROPERTY DAMAGE", "STOLEN PROPERTY", "OTHER PROPERTY", "MAIL THEFT", 
"PROPERTY DAMAGE", "VEHICLE THEFT", "OTHER PROPERTY", "ROBBERY", 
"CAR PROWL", "NARCOTICS", "OTHER PROPERTY", "BURGLARY", "DISTURBANCE", 
"ASSAULT", "BURGLARY-SECURE PARKING-RES", "OTHER PROPERTY", "FRAUD", 
"SHOPLIFTING", "OTHER PROPERTY", "OTHER PROPERTY", "DISTURBANCE", 
"CAR PROWL", "STOLEN PROPERTY", "OTHER PROPERTY", "OTHER PROPERTY", 
"VIOLATION OF COURT ORDER", "DISTURBANCE", "NARCOTICS", "ASSAULT", 
"DISTURBANCE", "TRESPASS", "NARCOTICS", "CAR PROWL", "NARCOTICS", 
"OTHER PROPERTY", "CAR PROWL", "CAR PROWL", "ASSAULT", "TRAFFIC", 
"OTHER PROPERTY", "CAR PROWL", "PROSTITUTION", "OTHER PROPERTY", 
"OTHER PROPERTY", "ASSAULT", "BURGLARY", "DISTURBANCE", "PROPERTY DAMAGE", 
"PROPERTY DAMAGE", "BURGLARY", "VEHICLE THEFT", "FRAUD", "VEHICLE THEFT", 
"FRAUD", "CAR PROWL", "BIKE THEFT", "CAR PROWL", "WARRANT ARREST", 
"STOLEN PROPERTY", "CAR PROWL", "PROPERTY DAMAGE", "VEHICLE THEFT", 
"BIKE THEFT"), occurred.date.or.date.range.start = c("04/17/2015 01:10:00 AM", 
"11/15/2015 12:04:00 PM", "05/29/2015 08:00:00 PM", "12/15/2015 02:25:00 PM", 
"07/28/2015 12:00:00 AM", "02/24/2015 06:01:00 PM", "05/24/2015 04:20:00 PM", 
"03/13/2015 02:04:00 PM", "06/14/2015 08:00:00 AM", "05/19/2015 03:18:00 PM", 
"07/18/2015 06:00:00 AM", "05/11/2015 05:16:00 PM", "01/08/2015 12:52:00 PM", 
"06/17/2015 05:00:00 PM", "07/04/2015 12:00:00 AM", "10/26/2015 12:12:00 AM", 
"05/01/2015 12:00:00 PM", "07/02/2015 10:00:00 PM", "01/10/2015 07:30:00 PM", 
"02/17/2015 01:29:00 PM", "12/17/2015 02:26:00 AM", "08/04/2015 10:49:00 PM", 
"10/27/2015 12:29:00 AM", "07/29/2015 03:00:00 PM", "10/24/2015 06:30:00 PM", 
"02/20/2015 03:07:00 AM", "11/11/2015 09:00:00 AM", "03/24/2015 10:00:00 PM", 
"11/03/2015 08:47:00 PM", "04/15/2015 02:00:00 PM", "07/15/2015 03:00:00 PM", 
"11/17/2015 08:30:00 AM", "09/22/2015 05:00:00 PM", "02/09/2015 09:19:00 AM", 
"01/07/2015 08:30:00 AM", "05/01/2015 07:30:00 AM", "04/26/2015 03:30:00 AM", 
"04/18/2015 03:00:00 AM", "10/01/2015 08:00:00 PM", "05/07/2015 01:00:00 AM", 
"02/05/2015 03:15:00 PM", "01/18/2015 05:00:00 PM", "10/17/2015 11:00:00 PM", 
"03/23/2015 05:35:00 PM", "02/16/2015 07:25:00 PM", "07/30/2015 08:00:00 PM", 
"11/10/2015 02:28:00 PM", "03/14/2015 10:10:00 AM", "12/10/2015 08:26:00 PM", 
"10/05/2015 01:45:00 AM", "02/16/2015 01:56:00 PM", "10/19/2015 06:27:00 PM", 
"12/01/2015 07:30:00 AM", "01/28/2015 08:40:00 PM", "05/01/2015 01:40:00 PM", 
"10/30/2015 03:15:00 AM", "09/04/2015 03:34:00 PM", "06/06/2015 04:53:00 PM", 
"07/22/2015 06:20:00 AM", "12/11/2015 01:41:00 PM", "05/20/2015 01:09:00 PM", 
"09/18/2015 12:00:00 PM", "07/08/2015 11:05:00 PM", "02/22/2015 01:38:00 AM", 
"07/22/2015 01:12:00 PM", "09/07/2015 10:43:00 AM", "08/11/2015 04:00:00 PM", 
"10/13/2015 06:33:00 AM", "10/10/2015 05:32:00 PM", "11/15/2015 07:09:00 PM", 
"11/19/2015 03:05:00 PM", "04/08/2015 04:33:00 PM", "05/11/2015 12:01:00 AM", 
"04/21/2015 06:15:00 PM", "06/13/2015 10:29:00 AM", "06/22/2015 06:41:00 PM", 
"09/03/2015 08:00:00 AM", "04/08/2015 06:00:00 PM", "07/17/2015 08:00:00 PM", 
"08/29/2015 09:00:00 AM", "04/28/2015 01:46:00 PM", "09/07/2015 07:00:00 PM", 
"12/30/2015 06:30:00 AM", "08/29/2015 11:37:00 PM", "08/24/2015 10:00:00 PM", 
"06/17/2015 07:02:00 AM", "02/14/2015 10:21:00 PM", "03/29/2015 07:00:00 PM", 
"10/01/2015 07:15:00 AM", "06/14/2015 03:00:00 PM", "12/16/2014 09:00:00 AM", 
"02/14/2015 07:54:00 PM", "10/02/2015 08:17:00 AM", "05/14/2015 08:30:00 AM", 
"07/07/2015 10:15:00 AM", "04/07/2015 01:48:00 AM", "11/02/2015 11:00:00 PM", 
"04/16/2015 03:00:00 PM", "08/22/2015 08:09:00 AM", "10/24/2015 05:00:00 PM"
)), .Names = c("summarized.offense.description", "occurred.date.or.date.range.start"
), row.names = c(NA, -100L), class = c("tbl_df", "tbl", "data.frame"
))

我使用以下代码从预先存在的列中提取时间数据:

#Splitting time from column occured.date
SPD_2015 <- mutate(SPD_2015, occurred.time = str_sub(SPD_2015$occurred.date.or.date.range.start, -11, -1))

#Converting character to time for occured.time
SPD_2015$occurred.time <- strptime(SPD_2015$occurred.time, "%I:%M:%S %p") %>%
  str_sub(-8, -1) %>%
  hms()
#creating the occurred.time.hour value so I can isolate the hour value
SPD_2015 <- mutate(SPD_2015, occurred.time.hour = hour(occurred.time))

现在我有一个列,其中包含犯罪发生的孤立小时值,我可以使用ggplot2绘制图表。但是,如果我使用dplyr对数据进行子集化:

#filtering data for only car prowl
car.prowl <- filter(SPD_2015, summarized.offense.description == "CAR PROWL")

新创建的数据框(car.prowl)中的“occurrence”,“time.time”和“occurrence .time.hour”列中的时间值不再匹配。 “happen.time.hour”列与源正确匹配,但现在更改了happen.time列。

只是为此添加。我为汽车四处寻找了一个单独的数据框,因为当我最初试图使用ggplot绘制犯罪的发生时间时

ggplot(car.prowl, aes(hour(occurred.time))) +
  geom_bar()

我会得到错误:“错误:美学必须是长度1或与数据(14)相同:x”。这是有道理的,我理解。

> dim(car.prowl)
[1] 14  4

但car.prowl的长度为14,当我输入以下代码时:

> length(hour(car.prowl$occurred.time))
[1] 100

它显示原始数据集的长度,而不是子集长度14。

有人可以建议解决方案或解决方法吗? 谢谢

1 个答案:

答案 0 :(得分:1)

有趣的问题。让我们首先获得绘图所需的输出。我们可以使用mdy_hms将字符转换为日期时间。使用sub_str可能比原始方法更强大。之后,hour可以根据日期时间提取小时。

library(tidyverse)
library(lubridate)
library(stringr)

SPD_2015_updated <- SPD_2015 %>%
  mutate(occurred.time = mdy_hms(occurred.date.or.date.range.start)) %>%
  mutate(occurred.time.hour = hour(occurred.time))

car.prowl_updated <- SPD_2015_updated %>%
  filter(summarized.offense.description == "CAR PROWL")

输入glimpse(SPD_2015_updated)glimpse(car.prowl_updated)。您可以看到每条记录都匹配。 occurred.time在日期时间类中,而occurred.time.hour在整数类中。我认为这些数据框可以为您的绘图做好准备。

由于您的原始方法出了问题,我不完全理解。但是,如果您输入glimpse(car.prowl),则可以看到occurred.time位于S4: Period。这可能是dplyr::filter无效的关键所在。如果我有时间,我会进一步调查为什么dplyr::filter无法对原始数据框进行子集化。