我一直想找出如何根据数据列的名称设置类的方法。
让我们假设我有一个命名的数据框,并且我想通过一个函数将一个类添加到列中。该类是使用列的名称在另一个函数中确定的:
library(dplyr)
df1 <- data.frame(hello = 1:4, world = 2:5)
add_class <- function(x, my_class) {
structure(x, class = c(class(x), my_class))
}
get_class_by_column_name <- function(column_name) {
if(grepl("hello", tolower(column_name))) {
return("greeting")
} else {
return("probably_not_greeting")
}
}
这两种方法分别按预期工作:
> class(df1$hello)
[1] "integer"
> df1$hello <- add_class(df1$hello, "class_added_manually")
> class(df1$hello)
[1] "integer" "class_added_manually"
> df1$hello <- add_class(df1$hello, get_class_by_column_name("hello"))
> class(df1$hello)
[1] "integer" "class_added_manually" "greeting"
但是我想弄清楚如何组合它们。这不起作用:
set_classes_by_column_names <- function(df) {
classes_df <- data.frame(name = names(df), class = '') %>%
rowwise %>%
mutate(class = get_class_by_column_name(name))
print(classes_df)
for (i in 1:length(classes_df$name)) {
add_class(my_column = df[,classes_df$name[i]], # select column by name
my_class = classes_df$class[i]) # use column name as function argument to find class
}
return(df)
}
名称分配仍然有效,但是似乎无法添加自定义类。
> df2 <- data.frame(hello = 1:4, world = 2:5)
> class(df2$hello)
[1] "integer"
> df2 <- set_classes_by_column_names(df2)
Source: local data frame [2 x 2]
Groups: <by row>
# A tibble: 2 x 2
name class
<fct> <chr>
1 hello greeting
2 world probably_not_greeting
> class(df2$hello)
[1] "integer"
这是什么问题?
此外,我想知道是否有一种方法可以在dplyr管道中代替for (i in 1:length(classes_df$name)) {...}
部分。这里的问题是,似乎没有任何函数可用于使用列名作为参数来突变数据框列,但是我的get_class_by_column_name
需要该名称。
答案 0 :(得分:4)
可以使用purrr
包在管道中完成此操作:
library(dplyr)
library(purrr)
set_class_by_name <- function(col, name) {
if (grepl("hello", name)) {
new_class <- "greeting"
} else {
new_class <- "probably_not_greeting"
}
return(structure(col, class = c(class(col), new_class)))
}
df2 <- df1 %>%
imap_dfc(set_class_by_name)
诀窍在purrr::imap
中,它对列表执行Apply-type操作,并另外将列表的名称作为第二个参数传递。这意味着很容易在自定义函数中获取名称。后缀_dfc
将输出(列表列表)转换回数据框。
答案 1 :(得分:2)
您可以将mutate_at
与dplyr函数(例如starts_with
)结合使用,
ends_with
和contains
df1 <- data.frame(hello = 1:4, world = 2:5,
cello = c('a', 'b'), sword = c(T, F))
df2 <-
df1 %>%
mutate_at(vars(starts_with('h')), add_class, 'zebra') %>%
mutate_at(vars(ends_with('d')), add_class, 'cow') %>%
mutate_at(vars(contains('cel')), add_class, 'giraffe')
lapply(df2, class)
#
# $`hello`
# [1] "integer" "zebra"
#
# $world
# [1] "integer" "cow"
#
# $cello
# [1] "factor" "giraffe"
#
# $sword
# [1] "logical" "cow"
答案 2 :(得分:1)
这是尝试修改您的第二个功能:
编辑 ::
get_class_by_column_name <- function(column_name) {
if(tolower(column_name)%in%c("hello")){
class(column_name)<-append(class(column_name),"greeting")[[2]]
#return(column_name)
} else {
class(column_name)<-append(class(column_name),"probably_not_greeting")[[2]]
#return(class(column_name))
}
}
unlist(Map(get_class_by_column_name,names(df1)))
hello world
"greeting" "probably_not_greeting"
原始 ::
get_class_by_column_name <- function(column_name) {
if(grepl("hello", tolower(column_name))) {
class(column_name)<-append(class(column_name),"greeting")
return(class(column_name))
} else {
class(column_name)<-append(class(column_name),"probably_not_greeting")
return(class(column_name))
}
}
Map(get_class_by_column_name,names(df1))
结果:
$hello
[1] "character" "greeting"
$world
[1] "character" "probably_not_greeting"