所以我有一个数据框“ fish8”,我尝试编写一个函数,该函数排除该数据框的三个列(BIN,收集器,国家/地区)的所有空行。问题是代码不是在函数内部运行,而是在函数外部运行。我在脚本中还有许多其他类似的功能,它们可以正常工作,为什么这个不能正常工作?
#so it doesn't work when I run it like this
remove_empties=function(fish8){
fish8<<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
fish8<<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
fish8<<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]
}
remove_empties(fish8)
#but it runs like this
fish8<-fish8[!(fish8$BIN == "" | is.na(fish8$BIN)), ]
fish8<-fish8[!(fish8$collectors == "" | is.na(fish8$collectors)), ]
fish8<-fish8[!(fish8$country == "" | is.na(fish8$country)), ]
答案 0 :(得分:2)
问题与变量的范围有关。在这种情况下,将在函数范围内分配函数的变量fish8
。原始的fish8
不会被触碰。参见https://www.r-bloggers.com/dont-run-afoul-of-scoping-rules-in-r/:
<<-发生的事情是,它开始从子级到父级向上走到环境树,直到找到匹配项,或者最终到达全局(顶部)环境。这是启动树遍历的一种方法(例如自动搜索),但会带来可怕的后果,因为您要在当前范围之外进行分配!无论它是在全球环境中,只有找到的第一个匹配项都将被更改。
您的选择包括:
remove_empties = function(fish8) {
fish8 <- fish8[!(fish8$x == '' | is.na(fish8$x)), ]
fish8 <- fish8[!(fish8$y == '' | is.na(fish8$y)), ]
}
fish8 <- remove_empties(fish8)
remove_empties2 = function(fish) {
fish <- fish[!(fish$x == '' | is.na(fish$x)), ]
fish <- fish[!(fish$y == '' | is.na(fish$y)), ]
}
fish8 <- remove_empties2(fish8)
remove_empties3 = function(fish) {
fish8 <<- fish[!(fish$x == '' | is.na(fish$x))
& !(fish$y == '' | is.na(fish$y)), ]
}
remove_empties3(fish8)
NA
,然后使用na.omit()
。我也放弃了函数调用-这比函数调用最多多了一行,并且只应该执行一次,因为不应重新引入空字符串:fish8[fish8==''] <- NA_character_
fish8 <- na.omit(fish8)
数据:
set.seed(1)
x <- sample(c('',NA_character_, letters[1:5]), 20, replace = T)
y <- sample(c('', NA_character_, letters[6:10]), 20, replace = T)
fish8 <- data.frame(x, y)