我的数据框如下:
Date Time Consumption kVARh kW weekday
2 2016-12-13 0:15:00 90.144 0.000 360.576 Tue
3 2016-12-13 0:30:00 90.144 0.000 360.576 Tue
4 2016-12-13 0:45:00 91.584 0.000 366.336 Tue
5 2016-12-13 1:00:00 93.888 0.000 375.552 Tue
6 2016-12-13 1:15:00 88.416 0.000 353.664 Tue
7 2016-12-13 1:30:00 88.704 0.000 354.816 Tue
8 2016-12-13 1:45:00 91.296 0.000 365.184 Tue
我从日期为因子的csv获取数据,我将其更改为as.character
,然后更改为as.date
。然后我添加了一个列,给了我使用星期几的信息
sigEx1DF$weekday <- format(as.Date(sigEx1DF$Date), "%a")
然后从星期日到星期六将其转换为有序因子。
这是来自智能仪表的细粒度数据,该仪表每隔15分钟测量一次使用(消耗)。 kW
是Consumption*4
。我需要在每个工作日取平均值,然后获得平均值的最大值,但是当我对数据框进行子集设置时,会像这样:
Date Time Consumption kVARh kW weekday
3 2016-12-13 0:30:00 90.144 0.000 360.576 Tue
8 2016-12-13 1:45:00 91.296 0.000 365.184 Tue
13 2016-12-13 3:00:00 93.600 0.000 374.400 Tue
18 2016-12-13 4:15:00 93.312 0.000 373.248 Tue
23 2016-12-13 5:30:00 107.424 0.000 429.696 Tue
28 2016-12-13 6:45:00 103.968 0.000 415.872 Tue
33 2016-12-13 8:00:00 108.576 0.000 434.304 Tue
现在缺少15分钟间隔中的几个间隔(例如,第4-7行)。我没有看到第4-7行有什么区别,但是在子集之后它们却不见了。
这是我用来子集的代码:
bldg1_Wkdy <- subset(sort.df, weekday == c("Mon","Tue","Wed","Thu","Fri"),
select = c("Date","Time","Consumption","kVARh","kW","weekday"))
这是子集之前的数据帧结构:
'data.frame': 72888 obs. of 6 variables:
$ Date : Date, format: "2016-12-13" "2016-12-13" "2016-12-13" ...
$ Time : Factor w/ 108 levels "0:00:00","0:15:00",..: 2 3 4 5 6 7 8 49 50 51 ...
$ Consumption: num 90.1 90.1 91.6 93.9 88.4 ...
$ kVARh : num 0 0 0 0 0 0 0 0 0 0 ...
$ kW : num 361 361 366 376 354 ...
$ weekday : Ord.factor w/ 7 levels "Sun"<"Mon"<"Tue"<..: 3 3 3 3 3 3 3 3 3 3 ...
我从工作日的72888个观察值减少到平日的10427个观察值,而周末的10368个观察值,如上所述,有许多行似乎是随机丢失的。某些间隔的功耗为零(由于风暴或其他原因,电力可能已经耗尽),但实际上这些间隔出现在子集数据中。因此看来零不是造成问题的原因。感谢您的帮助!
答案 0 :(得分:0)
您应该使用weekday == c("Mon","Tue","Wed","Thu","Fri")
而不是weekday %in% c("Mon","Tue","Wed","Thu","Fri")
,请参阅下面的最小测试,该测试可以显示%in%
的工作原理:
> subset(x, weekday == c("Mon","Tue","Wed","Thu","Fri"))
weekday
NA <NA>
> subset(x, weekday %in% c("Mon","Tue","Wed","Thu","Fri"))
weekday
1 Tue