Question

我有一个数字变量，称之为＆＃34; Blah＆＃34;。 Blah在一天中的不同时间间隔进行测量，并且总是在不断增加。我想找出每天Blah的第一次和最后一次观察之间的差异，并生成一份每天Blah增加总量的表格。

稍微复杂的是，如果Blah足够高，它将重置为非常低的数字。这总是发生在相同（当前未知）的数字上，并且每天最多一次。

可能很重要的一些细节：

Blah也在不同的指定位置进行测量。我想要一个按地点划分的总日数据框。：）

时间变量格式为＆＃34; mm / dd / yyyy hh：mm：ss＆＃34;

这就是我提出的大纲。我遇到的一个问题是，我还没有使用POSIXct对象，也不知道如何抓住这些值并实现这一目标。

A<-first value of Day
B<-last value of Day
C<-Maximum value of Blah from a day where reset happens (last value before reset)

For (each Day)
   For (each Location)

     If A < B 
        Then 
           DayTotal = B-A
        Else
            DayTotal = (C-A)+B

编辑：

我这里有一些数据格式错误。以下是正确的格式。

提前感谢您的帮助！

- 迈克尔

此外，在Blah重置的那一天，A总是超过B。

编辑编号2

我是一个可怕的人。数据实际上看起来像这样

   DESCRIPTION  rawCount   localDateTime
1   Arch Exit    33166  2014-05-23 07:55:05
2   Arch Exit    33167  2014-05-23 08:00:06
3   Arch Exit    33170  2014-05-23 08:10:06
4   Arch Exit    33173  2014-05-23 08:15:05
5   Arch Exit    33175  2014-05-23 08:20:05
6   Arch Exit    33178  2014-05-23 08:25:06
7   Northside    48073  2014-05-24 15:01:40
8   Northside    48119  2014-05-24 15:05:49
9   Northside    48167  2014-05-24 15:10:59
10  Northside    48237  2014-05-24 15:20:49
11  Northside       73  2014-05-24 15:25:59
12  Northside      350  2014-05-24 15:35:49
13  Northside     1430  2014-05-24 15:44:06
14  Northside     2554  2014-05-24 16:00:49

（假设上述数据每天完成）我希望结果看起来像

DESCRIPTION  totalCount     Date
Arch Exit       12       2014-05-23
Northside      2718      2014-05-23

另一个编辑

好的，所以使用下面的答案，我做了以下我认为它的工作。

rawDiff是一个已经存在的变量（在excel .... yikes中完成），而parse_date_time是来自Lubridridate包的函数，＆＃34; Full＆＃34;是我的数据和＆＃34; localdate＆＃34;是我想要的日期变量。

blahblah<-with(Full, tapply(rawDiff, list(parse_date_time(Full$localDate, "mdy"),          DESCRIPTION), function(x) {
sum(x[x>=0])}))

NA有些奇怪，使用单独的预制差异变量似乎有所帮助。此外，当它重置时，差异是负的，所以我只是采取了非负差异。

Answer 1

@ MrFlick的答案可以轻松调整以适应您的新数据，但我会分享一个变体来表明，因为您已经定义了逻辑，所以几乎逐字逐句地翻译很容易。

我们从一个看向量的简单函数开始。

myFun <- function(x) {
  A <- x[1]                    # What's the first value?
  B <- x[length(x)]            # What's the last value?
  if (B < A) {                 # If the last value is less than the first
    FLAG <- which(diff(x) < 0) # Identify where the value changes...
    C <- x[FLAG]               # ... and extract it
    C - A + B                  # Calculate according to your defined logic
  } else {                     # Otherwise, if things look straightforward
    B - A                      # Just calculate the difference
  }
}

拥有该功能后，您可以使用R中可用的众多“聚合”功能之一：tapply，by或aggregate。这些聚合函数将处理问题逻辑的“每天，每个位置”部分。

这是aggregate，因为它与您想要的输出最匹配：

aggregate(rawCount ~ DESCRIPTION + as.Date(localDateTime), mydf, myFun)
#   DESCRIPTION as.Date(localDateTime) rawCount
# 1   Arch Exit             2014-05-23       12
# 2   Northside             2014-05-24     2718

为此，我使用了以下示例数据：

mydf <- structure(list(
  DESCRIPTION = c("Arch Exit", "Arch Exit", "Arch Exit", "Arch Exit", 
                  "Arch Exit", "Arch Exit", "Northside", "Northside", 
                  "Northside", "Northside", "Northside", "Northside", 
                  "Northside", "Northside"), 
  rawCount = c(33166L, 33167L, 33170L, 33173L, 33175L, 33178L, 48073L, 
               48119L, 48167L, 48237L, 73L, 350L, 1430L, 2554L), 
  localDateTime = structure(c(1400831705, 1400832006, 1400832606, 
                              1400832905, 1400833205, 1400833506, 
                              1400943700, 1400943949, 1400944259, 
                              1400944849, 1400945159, 1400945749, 
                              1400946246, 1400947249), 
                            class = c("POSIXct", "POSIXt"), tzone = "GMT")), 
                  .Names = c("DESCRIPTION", "rawCount", "localDateTime"), 
                  row.names = c("1", "2", "3", "4", "5", "6", "7", "8", 
                                "9", "10", "11", "12", "13", "14"), 
                  class = "data.frame")

Answer 2

在寻求这样的帮助时，提供样本数据和所需的输出非常有用。由于您没有提供一个，我将使用它（更新以匹配编辑2中的变量名称）

#sample data
set.seed(15)
dd<-data.frame(
    DESCRIPTION=rep(letters[1:3], 9*5),
    rawCount=cumsum(rpois(3*5*9, 4)) %% 75,
    localDateTime=rep(seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), 
        by="6 hours"), each=5*3)
)

我还将定义一个辅助函数，它将从POSIXct值的下降时间通过向下转换为简单的＆＃34;日期＆＃34;类

droptime<-as.Date

然后我们可以做

with(dd, tapply(rawCount, list(droptime(localDateTime), DESCRIPTION), function(x) {
    d <- diff(x)
    d[d<0] <- tail(x,-1)[d<0]
    sum(d)
}))

或获取编辑2中的表单

aggregate(rawCount~droptime(localDateTime)+DESCRIPTION, dd, FUN=function(x) {
    d <- diff(x)
    d[d<0] <- tail(x,-1)[d<0]
    sum(d)
})

这样做对于每个位置/日期组合，它将计算值的范围。我稍微重写了你的定义，看看成对差异，如果差异是负面的，假设我们已经重新开始为零（这将允许数字重置两次的情况，即使你不会发生这种情况）。 tapply版本将返回

形式的矩阵

             a   b   c
2001-01-01 221 233 243
2001-01-02 230 232 219
2001-01-03  32  34  36

将Date值的字符串版本作为rownames，将位置作为colnames或

  droptime(localDateTime) DESCRIPTION rawCount
1              2001-01-01           a      221
2              2001-01-02           a      230
3              2001-01-03           a       32
4              2001-01-01           b      233
5              2001-01-02           b      232
6              2001-01-03           b       34
7              2001-01-01           c      243
8              2001-01-02           c      219
9              2001-01-03           c       36

使用aggregate方法（此处保留Date类）。

要使用更新的样本数据（编辑1），您可以使用

sapply(xx[-1], function(x,g) {
    tapply(x, g, function(x) {
        d <- diff(x)
        d[d<0] <- tail(x,-1)[d<0]
        sum(d)
    })  
}, g=xx[[1]])

获取

  06/24/2014 06/25/2014
A          8         52
B          4         57

如何在R中按日找出条件差异

2 个答案: