使用dplyr :: mutate计算适用于数据框的日出函数?

时间:2018-12-10 03:59:21

标签: r dplyr

我在尝试将其应用于数据框以在新列中进行变异时编写的函数遇到麻烦

我想在数据框中添加一列,以根据纬度,经度和日期的现有列来计算所有行的日出/日落时间。日出/日落计算源自maptools软件包中的“ sunriseset”函数。

下面是我的功能:

library(maptools)
library(tidyverse)

sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
{
        lat.long <- matrix(c(long, lat), nrow = 1)
        day <- as.POSIXct(date, tz = timezone)
        sequence <- seq(from = day, length.out = num.days, by = "days")
        sunrise <- sunriset(lat.long, sequence, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, sequence, direction = "sunset", 
                           POSIXct = TRUE)
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                return(ss[1,1])     
        } else {
                return(ss[1,2])
        }       
}

当我为单个输入运行函数时,我得到了预期的输出:

sunrise.set2(41.2, -73.2, "2018-12-09 07:34:0", timezone="EST", 
    direction = "sunset", num.days = 1)
[1] "2018-12-09 16:23:46 EST"

但是,当我尝试对数据框对象执行此操作以像这样在新列中进行突变时:

df <- df %>% 
    mutate(set = sunrise.set2(Latitude, Longitude, LocalDateTime, timezone="UTC", num.days = 1, direction = "sunset"))

我收到以下错误:

Error in mutate_impl(.data, dots) : 
  Evaluation error: 'from' must be of length 1.

我df的dput在下面。我怀疑为了正确地向量化我的功能我没有做正确的事,但是我不确定是什么。

谢谢

dput(df):

structure(list(Latitude = c(20.666, 20.676, 20.686, 20.696, 20.706, 
20.716, 20.726, 20.736, 20.746, 20.756, 20.766, 20.776), Longitude = c(-156.449, 
-156.459, -156.469, -156.479, -156.489, -156.499, -156.509, -156.519, 
-156.529, -156.539, -156.549, -156.559), LocalDateTime = structure(c(1534318440, 
1534404840, 1534491240, 1534577640, 1534664040, 1534750440, 1534836840, 
1534923240, 1535009640, 1535096040, 1535182440, 1535268840), class = c("POSIXct", 
"POSIXt"), tzone = "UTC")), .Names = c("Latitude", "Longitude", 
"LocalDateTime"), row.names = c(NA, -12L), class = c("tbl_df", 
"tbl", "data.frame"), spec = structure(list(cols = structure(list(
    Latitude = structure(list(), class = c("collector_double", 
    "collector")), Longitude = structure(list(), class = c("collector_double", 
    "collector")), LocalDateTime = structure(list(format = "%m/%d/%Y %H:%M"), .Names = "format", class = c("collector_datetime", 
    "collector"))), .Names = c("Latitude", "Longitude", "LocalDateTime"
)), default = structure(list(), class = c("collector_guess", 
"collector"))), .Names = c("cols", "default"), class = "col_spec"))

1 个答案:

答案 0 :(得分:2)

问题确实是您的函数现在没有被向量化,如果给它一个以上的值,它将中断。一种解决方法(如Suliman所建议)正在使用rowwise()apply的变体,但这会给您的功能带来很多不必要的工作。

最好将其向量化,因为maptools::sunriset也被向量化。第一个建议:使用向量作为输入来调试或重写它,然后您很容易看到意外发生的行。让我们逐行进行介绍,在您用其他替换它的地方,我的注释超出了您的注释:

library(maptools)
library(tidyverse)

# sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset"), num.days = 1) 
sunrise.set2 <- function (lat, long, date, timezone = "UTC", direction = c("sunrise", "sunset")
# Why an argument saying how many days? You have the length of your dates
{
        #lat.long <- matrix(c(long, lat), nrow = 1)
        lat.long <- cbind(lon, lat)
        day <- as.POSIXct(date, tz = timezone)
        # sequence <- seq(from = day, length.out = num.days, by = "days") # Your days object is fine
        sunrise <- sunriset(lat.long, day, direction = "sunrise", 
                            POSIXct = TRUE)
        sunset <- sunriset(lat.long, day, direction = "sunset", 
                           POSIXct = TRUE)
        # I've replaced sequence with day here
        ss <- data.frame(sunrise, sunset)
        ss <- ss[, -c(1, 3)]
        colnames(ss) <- c("sunrise", "sunset")

        if (direction == "sunrise") {
                #return(ss[1,1])
                return(ss[,1])
        } else {
                #return(ss[1,2])
                return(ss[,2])
        }       
}

但是从功能上看,我认为还有很多多余的工作没有任何目的。

  • 您正在计算日出和日落,仅使用其中之一。而且,您甚至可以不看方向就传递一个方向参数。
  • 要求一个单独的日期和时区是否有用?当您的用户给您一个POSIXt对象时,将包括时区。如果您可以输入一个字符串作为日期,这很好,但是只有在格式正确的情况下,它才有效。为简单起见,我只要求输入POSIXct(在您的example-data.frame中)
  • 您为什么要制作data.frame并在返回之前分配名称?设置子集后,一切都会再次掉落。

这意味着您的功能可以短很多:

sunrise.set2 <- function(lat, lon, date, direction = c("sunrise", "sunset")) {
  lat.long <- cbind(lon, lat)
  sunriset(lat.long, date, direction=direction, POSIXct.out=TRUE)[,2]
}

如果您无法控制自己的输入,则可能需要添加一些检查,但是通常我发现,仅专注于要完成的事情最有用。