我有以下需要经常重复的代码块:
flights <- fread("https://raw.githubusercontent.com/wiki/arunsrinivasan/flights/NYCflights14/flights14.csv")
flights$origin %>% table()
flights[grepl("jfk", origin, ignore.case = TRUE),
origin := "0",
][grepl("ewr|lga", origin, ignore.case = TRUE),
origin := "1",
][, origin := as.numeric(origin)]
flights$origin %>% table()
这是我尝试将其包装在一个函数中的功能,该函数使我可以n
个正则表达式表达式并替换数据集中任何给定列的表达式。
my_function <- function(regex, replacement, column) {
flights[, column, with = FALSE] %>% table()
for (i in seq_along(regex)) {
responses[grepl(regex[i], column, ignore.case = TRUE),
column := replacement[i],
with = FALSE]
}
flights[, column := as.numeric(column)]
flights[, column, with = FALSE] %>% table()
}
但这会发出以下警告消息:
Warning messages:
1: In `[.data.table`(flights, grepl(regex[i], column, ignore.case = TRUE), :
with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning.
2: In eval(jsub, SDenv, parent.frame()) : NAs introduced by coercion
任何帮助将不胜感激。非常感谢。
答案 0 :(得分:1)
弄清楚了,将把我的解决方案留在这里,对其他所有人都有利。
with = FALSE
来使用()
来按名称引用列。grepl()
),请使用get()
函数。my_function <- function(regex, # Vector of regex strings to match
replacement, # Vector of strings to replace the matches from 'regex' arg
column, # Column to operate on
as.numeric = FALSE # Optional arg; convert 'column' to numeric as final step?
) {
cat("Converting:..")
responses[, column, with = FALSE] %>% table() %>% print
for (i in seq_along(regex)) {
responses[grepl(regex[i], get(column), ignore.case = TRUE, perl = TRUE),
(column) := replacement[i]]
}
if (as.numeric) {
responses[, (column) := as.numeric(get(column))]
}
cat("to:..")
responses[, column, with = FALSE] %>% table() %>% print
}