Question

我在R中有一个数据框，我从R中的csv上传，我试图找到每天的最高温度。 data.frame被格式化为col（1）是Date（YYYY-MM-DD HH：mm格式），col（2）是该Date / Time的温度。我尝试将数据分类为子集，自上而下（当年的年份，月份，这几个月的天数），但发现它非常复杂。

以下是数据框的示例：

                 Date Unit Temp
1 2012-10-21 21:14:00    C 82.5
2 2012-10-21 21:34:00    C 37.5
3 2012-10-21 21:54:00    C 20.0
4 2012-10-21 22:14:00    C 26.5
5 2012-10-21 22:34:00    C 20.0
6 2012-10-21 22:54:00    C 19.0

Answer 1

包xts中的apply.daily函数完全符合您的要求。

install.packages("xts")
require('xts')

tmp <- data.frame(Date = seq(as.POSIXct("2013-06-18 10:00"),
    length.out = 100, by = "6 hours"),
    Unit = "C",
    Temp = rnorm(n = 100, mean = 20, sd = 5)) # thanks to dickoa for this code

head(tmp)
data <- xts(x=tmp[ ,3], order.by=tmp[,1])
attr(data, 'Unit') <- tmp[,'Unit']
attr(data, 'Unit')

dMax <- apply.daily(data, max)
head(dMax)

Answer 2

我会创建一个日期（DoY）的列，然后使用aggregate函数查找每个DoY的最高温度。

例如，假设您将data.frame称为Data，Data有两列：第一列名为“Date”，第二列名为“Temperature”。我会做以下事情：

Data[,"DoY"] <- format.Date(Data[,"Date"], format="%j") #make sure that Data[,"Date"] is already in a recognizable format-- e.g., see as.POSIXct()
MaxTemps <- aggregate(Data[,"Temperature"], by=list(Data[,"DoY"]), FUN=max) # can add na.rm=TRUE if there are missing values

MaxTemps应包含每天观察到的最高温度。但是，如果您的数据集中有多年，例如，第169天（今天）重复不止一次（例如，今天和1年前），您可以执行以下操作：

Data[,"DoY"] <- format.Date(Data[,"Date"], format="%Y_%j") #notice the date format, which will be unique for all combinations of year and day of year.
MaxTemps <- aggregate(Data[,"Temperature"], by=list(Data[,"DoY"]), FUN=max) # can add na.rm=TRUE if there are missing values

我希望这有帮助！

Answer 3

没有可复制的例子并不是一件容易的事。

话虽如此，您可以使用lubridate（日期管理）和plyr（拆分应用）来解决此问题。

让我们创建一个类似于你的数据

set.seed(123)
tmp <- data.frame(Date = seq(as.POSIXct("2013-06-18 10:00"),
                  length.out = 100, by = "6 hours"),
                  Unit = "C",
                  Temp = rnorm(n = 100, mean = 20, sd = 5))
str(tmp)
## 'data.frame':    100 obs. of  3 variables:
##  $ Date: POSIXct, format: "2013-06-18 10:00:00" ...
##  $ Unit: Factor w/ 1 level "C": 1 1 1 1 1 1 1 1 1 1 ...
##  $ Temp: num  17.2 18.8 27.8 20.4 20.6 ...


write.csv(tmp, "/tmp/tmp.csv", row.names = FALSE)
rm(tmp)

现在我们可以计算出最大值

require(lubridate)
require(plyr)

### NULL is to not import the second column which is the unit 
tmp <- read.csv("/tmp/tmp.csv",
                colClasses = c("POSIXct", "NULL", "numeric"))


tmp <- transform(tmp, jday = yday(Date))


ddply(tmp, .(jday), summarise, max_temp = max(Temp))

##    jday max_temp
## 1   169   27.794
## 2   170   28.575
## 3   171   26.120
## 4   172   22.004
## 5   173   28.935
## 6   174   18.910
## 7   175   24.189
## 8   176   26.269
## 9   177   24.476
## 10  178   23.443
## 11  179   18.960
## 12  180   30.845
## 13  181   23.900
## 14  182   26.843
## 15  183   27.582
## 16  184   21.898
...................

Answer 4

我假设您有一个名为df的数据框，其中包含变量date和temp。这段代码是未经测试的，但它可能有用，运气不错。

library(lubridate)
df$justday <- floor_date(df$date, "day")

# for just the maxima, you could use this:
tapply(df$temp, df$justday, max)

# if you would rather have the results in a data frame, use this:
aggregate(temp ~ justday, data=df)

日期与温度数据框架：每天在R中查找最高温度

4 个答案: