我想写一个带有一些可选参数的R函数。它应该通过两个核心参数对一些数据进行子集化,然后我希望能够选择传递其他约束。例如
filter_func <- function(start_datetime, end_datetime, user=*, type=*){
as.data.frame(subset(df, format(df$datetime,"%Y-%m-%d %H:%M:%S") > start_datetime &
format(df$datetime,"%Y-%m-%d %H:%M:%S") < end_datetime) &
df$user == user &
df$type == type)
所以...如果我传递一个参数,它会将其约束到用户或类型上的那一列,但如果我不使用通配符并获取列中的所有内容?
我在这里看到过使用%in%
或grepl()
的示例,但这些示例似乎更多地针对您拥有字符串的一部分然后想要其余部分...就像{{1}获取new_york
和new_york_city
...我不想获得与参数完全匹配的任何值!
编辑:现在使用示例
所以...理想情况下是这样的......
new_york_state
使用我的函数 start | end | user | type |
-----------------|------------------|------|------|
2017-01-01 11:00 | 2017-01-01 20:00 | usr1 | typ1 |
2017-01-01 12:00 | 2017-01-01 19:00 | usr2 | typ2 |
2017-01-01 02:00 | 2017-01-01 03:00 | usr2 | typ1 |
2017-03-01 01:00 | 2017-03-01 09:00 | usr1 | typ2 |
2017-04-01 05:00 | 2017-04-01 07:00 | usr3 | typ4 |
2017-05-01 01:00 | 2017-05-01 08:00 | usr2 | typ5 |
得到我:
filter_func(2017-01-01 00:00, 2017-01-01 23:59)
但如果我添加一个参数 start | end | user | type |
-----------------|------------------|------|------|
2017-01-01 11:00 | 2017-01-01 20:00 | usr1 | typ1 |
2017-01-01 12:00 | 2017-01-01 19:00 | usr2 | typ2 |
2017-01-01 02:00 | 2017-01-01 03:00 | usr2 | typ1 |
filter_func(2017-01-01 00:00, 2017-01-01 23:59, usr2)
甚至 start | end | user | type |
-----------------|------------------|------|------|
2017-01-01 12:00 | 2017-01-01 19:00 | usr2 | typ2 |
2017-01-01 02:00 | 2017-01-01 03:00 | usr2 | typ1 |
filter_func(2017-01-01 00:00, 2017-01-01 23:59, usr2, typ2)
答案 0 :(得分:1)
您需要使用grepl()
进行模式匹配。
filter_func <- function(start_datetime, end_datetime, user_='*', type_='*'){
subset(df, as.POSIXlt(df$start) > as.POSIXlt(start_datetime) &
as.POSIXlt(df$end) < as.POSIXlt(end_datetime) &
grepl(user_, df$user) &
grepl(type_, df$type))
}
filter_func(start='2017-01-01 00:00', end='2017-01-01 23:59')
# start end user type
#1 2017-01-01 11:00 2017-01-01 20:00 usr1 typ1
#2 2017-01-01 12:00 2017-01-01 19:00 usr2 typ2
#3 2017-01-01 02:00 2017-01-01 03:00 usr2 typ1
filter_func(start='2017-01-01 00:00', end='2017-01-01 23:59', user='usr2')
# start end user type
#2 2017-01-01 12:00 2017-01-01 19:00 usr2 typ2
#3 2017-01-01 02:00 2017-01-01 03:00 usr2 typ1
filter_func(start='2017-01-01 00:00', end='2017-01-01 23:59', user='usr2', type='typ2')
# start end user type
#2 2017-01-01 12:00 2017-01-01 19:00 usr2 typ2
答案 1 :(得分:1)
首先,
[
对于程序化使用比subset
更安全。format
,它将日期时间对象转换为字符串;你需要as.POSIXct
之类的东西,它将字符串解析为日期时间。你可以在函数中执行此操作,但应在函数之前执行此操作,因为您始终希望解析日期时间,并且重复执行此操作毫无意义。if
之类的控制流。您仍然需要检查变量是否存在。两种选择:
missing
,它是为检查功能参数是否存在而构建的。NULL
并使用is.null
。<
运算符将尝试强制与同一类不匹配的对象。)然后,
df <- data.frame(start = c("2017-01-01 11:00", "2017-01-01 12:00", "2017-01-01 02:00",
"2017-03-01 01:00", "2017-04-01 05:00", "2017-05-01 01:00"),
end = c("2017-01-01 20:00", "2017-01-01 19:00", "2017-01-01 03:00",
"2017-03-01 09:00", "2017-04-01 07:00", "2017-05-01 08:00"),
user = c("usr1", "usr2", "usr2", "usr1", "usr3", "usr2"),
type = c( "typ1", "typ2", "typ1", "typ2", "typ4", "typ5"))
# parse in two steps if you like, e.g. df$start <- as.POSIXct(df$start)
df[1:2] <- lapply(df[1:2], as.POSIXct)
filter_func <- function(x, start_time, end_time, usr, typ = NULL){
x <- x[x$start > start_time & x$end < end_time, ]
if (!missing(usr)) {
x <- x[x$user %in% usr, ]
}
if (!is.null(typ)) {
x <- x[x$type %in% typ, ]
}
x
}
并测试它:
str(df)
#> 'data.frame': 6 obs. of 4 variables:
#> $ start: POSIXct, format: "2017-01-01 11:00:00" "2017-01-01 12:00:00" ...
#> $ end : POSIXct, format: "2017-01-01 20:00:00" "2017-01-01 19:00:00" ...
#> $ user : Factor w/ 3 levels "usr1","usr2",..: 1 2 2 1 3 2
#> $ type : Factor w/ 4 levels "typ1","typ2",..: 1 2 1 2 3 4
filter_func(df, as.POSIXct('2017-01-01 00:00'), as.POSIXct('2017-01-01 23:59'))
#> start end user type
#> 1 2017-01-01 11:00:00 2017-01-01 20:00:00 usr1 typ1
#> 2 2017-01-01 12:00:00 2017-01-01 19:00:00 usr2 typ2
#> 3 2017-01-01 02:00:00 2017-01-01 03:00:00 usr2 typ1
filter_func(df, '2017-01-01 00:00', '2017-01-01 23:59')
#> start end user type
#> 1 2017-01-01 11:00:00 2017-01-01 20:00:00 usr1 typ1
#> 2 2017-01-01 12:00:00 2017-01-01 19:00:00 usr2 typ2
#> 3 2017-01-01 02:00:00 2017-01-01 03:00:00 usr2 typ1
filter_func(df, '2017-01-01 00:00', '2017-01-01 23:59', 'usr2')
#> start end user type
#> 2 2017-01-01 12:00:00 2017-01-01 19:00:00 usr2 typ2
#> 3 2017-01-01 02:00:00 2017-01-01 03:00:00 usr2 typ1
filter_func(df, '2017-01-01 00:00', '2017-01-01 23:59', 'usr2', 'typ2')
#> start end user type
#> 2 2017-01-01 12:00:00 2017-01-01 19:00:00 usr2 typ2