Question

我是R的新用户，我有点卡住，我的数据看起来像这样：

dates        temp
01/31/2011    40
01/30/2011    34
01/29/2011    30
01/28/2011    52
01/27/2011    39
01/26/2011    37
...
01/01/2011    31

我想要只接受40度以下的温度以及开始和结束的日期以及它持续的天数，例如：

from         to           days
01/29/2011   01/30/2011     2
01/26/2011   01/27/2011     2

我尝试使用difftime，但它没有用，可能会有一个功能。

任何帮助将不胜感激。

Answer 1

我做这样的事情。我在这里使用data.table。

df <- read.table(header=TRUE, text="dates        temp
01/31/2011    40
01/30/2011    34
01/29/2011    30
01/28/2011    52
01/27/2011    39
01/26/2011    37", stringsAsFactors=FALSE)

require(data.table)
dt <- data.table(df)
dt <- dt[, `:=`(date.form = as.Date(dates, format="%m/%d/%Y"), 
          id = cumsum(as.numeric(temp >= 40)))][temp < 40]
dt[, list(from=min(date.form), to=max(date.form), count=.N), by=id]

#    id       from         to count
# 1:  1 2011-01-29 2011-01-30     2
# 2:  2 2011-01-26 2011-01-27     2

我们的想法是先创建一个dates列转换为Date格式的列。然后，另一列id找到temp >= 40的位置，并使用该列创建两个temp>=40内的值组。也就是说，如果您有c(40, 34, 30, 52, 39, 37)，那么您需要c(1,1,1,2,2,2)。也就是说，值>= 40之间的所有内容必须属于同一组（34,30 - > 1和39,37 - > 2）。完成此操作后，我会删除temp >= 40条目。

然后，您可以按此群组进行拆分，然后点击min和max以及length(.)（默认情况下存储在.N中）。

Answer 2

不像Arun data.table那样优雅，但这里是base解决方案

DF <- read.table(text = "dates        temp\n01/31/2011    40\n01/30/2011    34\n01/29/2011    30\n01/28/2011    52\n01/27/2011    39\n01/26/2011    37", 
    header = TRUE, stringsAsFactors = FALSE)

DF$dates <- as.POSIXct(DF$dates, format = "%m/%d/%Y")
DF <- DF[order(DF$dates), ]
DF$ID <- cumsum(DF$temp >= 40)
DF2 <- DF[DF$temp < 40, ]

# Explanation split : split DF2 by DF2$ID 
# lapply : apply function on each list element given by split
# rbind : bind all the data together

do.call(rbind, lapply(split(DF2, DF2$ID), function(x) 
            data.frame(from = min(x$dates),  
                       to = max(x$dates), 
                       count = length(x$dates))))
##         from         to count
## 0 2011-01-26 2011-01-27     2
## 1 2011-01-29 2011-01-30     2

Answer 3

首先读入数据。 read.zoo处理一行中的许多细节，包括重新排序要升序的数据并将日期转换为"Date"类。如果z是生成的zoo对象，则coredata(z)给出温度，time(z)给出日期。

Lines <- "
dates        temp
01/31/2011    40
01/30/2011    34
01/29/2011    30
01/28/2011    52
01/27/2011    39
01/26/2011    37
"

library(zoo)
z <- read.zoo(text = Lines, header = TRUE, format = "%m/%d/%Y")

所有这一切的关键是使用rle来计算lengths和values，我们可以从中得出所有数量：

tt <- time(z)
with(rle(coredata(z) < 40), {
   to <- cumsum(lengths)[values]
   lengths <- lengths[values]
   from <- to - lengths + 1
   data.frame(from = tt[from], to = tt[to], days = lengths)
})

使用显示的输入数据的前6行，我们得到：

       from          to   days
1 2011-01-26 2011-01-27      2
2 2011-01-29 2011-01-30      2

日期间隔和数据操作

3 个答案: