根据多个条件创建输出

时间:2015-09-15 00:38:03

标签: r

我有2个数据帧用于2个堆栈,提供有关潜在排放的信息。一个数据框给出了系统在4个季节打开和关闭的小时数的时间范围。每个季节从特定日期开始。第二个文件给我详细的堆栈。

我正在尝试使用一些示例文件来测试如何执行此操作,到目前为止,我已设法在堆栈溢出示例之后创建一个函数,允许我创建一个包含我想要的日期的数据框和一个带有四季的列对于每个日期。我现在真的在编程概念上苦苦挣扎,以了解如何组合3个数据帧来创建我想要设置的输出模板。

为了向您展示我的示例输入示例:

Stack_info文件:

enter image description here

示例季节性配置文件,显示系统何时开启或关闭:

enter image description here

我输出的输出应该按照以下格式为每年创建数据框(只有黑色字体和红色文本才能解释这些值是什么):

enter image description here

我发现的最困难的是,我每年的输出文件将具有唯一的第一行,第二行将针对每种污染物重复。从第3行开始所有8760小时的每小时数据。这需要重复下一个污染物。

到目前为止,我已设法创建一个功能,帮助我为一年中的每一天分配季节。例如:

#function to create seasons
d = function(month_day) which(lut$month_day == month_day)
lut = data.frame(all_dates = as.POSIXct("2012-1-1") + ((0:365) * 3600 * 24),
                 season = NA)
lut = within(lut, { month_day = strftime(all_dates, "%b-%d") })
lut[c(d("Jan-01"):d("Mar-15"), d("Nov-08"):d("Dec-31")), "season"] = "winter"
lut[c(d("Mar-16"):d("Apr-30")), "season"] = "spring"
lut[c(d("May-01"):d("Sep-27")), "season"] = "summer"
lut[c(d("Sep-28"):d("Nov-07")), "season"] = "autumn"
rownames(lut) = lut$month_day

## create date data frame and assign seasons
dates = data.frame(dates =seq(as.Date('2010-01-01'),as.Date('2012-12-31'),by = 1))

 dates = within(dates, { 
  season =  lut[strftime(dates, "%b-%d"), "season"] 
})

这给了我一个日期数据框和我的其他2个样本数据框(如图所示):

structure(list(`Source no` = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Source = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L), .Label = c("Stack 1", "Stack 2"), class = "factor"), 
    Period = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), Day = structure(c(2L, 
    6L, 7L, 5L, 1L, 3L, 4L, 2L, 6L, 7L, 5L, 1L, 3L, 4L, 2L, 6L, 
    7L, 5L, 1L, 3L, 4L), .Label = c("Fri", "Mon", "Sat", "Sun", 
    "Thu", "Tue", "Wed"), class = "factor"), `Spring On` = c(0L, 
    0L, 0L, 0L, 0L, 0L, 0L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 15L, 
    15L, 15L, 15L, 15L, 15L, 15L), `Spring Off` = c(23L, 23L, 
    23L, 23L, 23L, 23L, 23L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 18L, 
    18L, 18L, 18L, 18L, 18L, 18L), `Summer On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Summer Off` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Autumn On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Autumn Off` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L), .Label = "off", class = "factor"), `Winter On` = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
    2L, 2L, 2L, 2L, 2L), .Label = c("0", "off"), class = "factor"), 
    `Winter Off` = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("23", 
    "off"), class = "factor")), .Names = c("Source no", "Source", 
"Period", "Day", "Spring On", "Spring Off", "Summer On", "Summer Off", 
"Autumn On", "Autumn Off", "Winter On", "Winter Off"), class = "data.frame", row.names = c(NA, 
-21L)) -> profile

structure(list(SNAME = structure(1:2, .Label = c("Stack 1", "Stack 2"
), class = "factor"), ISVARY = c(1L, 4L), VELVOL = c(1L, 4L), 
    TEMPDENS = c(0L, 2L), `DUM 1` = c(999L, 999L), `DUM 2` = c(999L, 
    999L), NPOL = c(2L, 2L), `EXIT VEL` = c(26.2, 22.4), TEMP = c(341L, 
    328L), `STACK DIAM` = c(1.5, 2.5), W = c(0L, 15L), Nox = c(39, 
    33.3), Sox = c(15.5, 17.9)), .Names = c("SNAME", "ISVARY", 
"VELVOL", "TEMPDENS", "DUM 1", "DUM 2", "NPOL", "EXIT VEL", "TEMP", 
"STACK DIAM", "W", "Nox", "Sox"), class = "data.frame", row.names = c(NA, 
-2L)) -> stack_info

如果有人能给我任何关于我如何继续编程部分的指导将非常有用,因为我不知道如何在2010年,2011年和2012年创建单独的输出文件作为数据框架。

1 个答案:

答案 0 :(得分:2)

您的数据组织方式不是处理的理想选择。也许你看看Hadley Wickhams papar关于tidy data

根据您所需的输出,您需要一个数据帧,其行数等于特定机器(堆栈n)打开的小时数。因此,我建议您创建一个包含给定年份每小时的数据框:

d.out = data.frame(dates = seq(from=as.POSIXct("2010-01-01"), by=3600, to= as.POSIXct("2010-12-31")))
d.out$year = as.numeric(format(d.out$dates, "%Y"))
d.out$month = as.numeric(format(d.out$dates, "%m"))
d.out$day = as.numeric(format(d.out$dates, "%d"))
d.out$hour = as.numeric(format(d.out$dates, "%H"))
d.out$weekday = as.character(format(d.out$dates, "%a"))
d.out$doj = as.numeric(format(d.out$dates, "%j"))
d.out$season = "Winter"
d.out$season[d.out$doj >= 75 & d.out$doj < 121] = "Spring"
d.out$season[d.out$doj >= 121 & d.out$doj < 271] = "Summer"
d.out$season[d.out$doj >= 271 & d.out$doj < 312] = "Autumn"

目标是将此数据框与您的个人资料数据框相结合。在加入之前,必须重新排列profile-df:

library(dplyr)
library(tidyr)

profile_new =
profile %>%
    gather(season, hour, -c(`Source no`, Source, Period, Day)) %>%
    extract(season, c("season", "status"), "(\\w+?)\\s(\\w+)") %>%
    filter(hour != "off") %>%
    mutate(Day = as.character(Day), hour=as.numeric(hour)) %>%
    spread(status, hour)

现在可以轻松加入三个数据框,汇总创建输出所需的所有信息:

d.out %>%
    inner_join(profile_new, by=c("weekday"="Day", "season"="season")) %>%
    group_by(Source, dates, year, day, weekday, season, hour) %>%
    summarise(status = any(hour >= On & hour <= Off)) %>%
    inner_join(stack_info, by=c("Source"="SNAME")) %>%
    mutate(Nox = ifelse(status, Nox, 0),
             Sox = ifelse(status, Sox, 0)) %>%
    arrange(Source, year, dates, hour) %>%
    select(Source, year, day, weekday, season, hour, `EXIT VEL`, TEMP, `STACK DIAM`, W, Nox, Sox)

显然,它不是你发布的格式。从这里你可以将数据帧写入csv(通过使用append = TRUE来堆栈)。