1-2个字母对象名称与现有R对象冲突的是什么?

时间:2011-08-08 08:48:18

标签: r naming-conventions

为了使我的代码更具可读性,我想避免在创建新对象时已经存在的对象的名称。由于R的基于包的特性,并且因为函数是第一类对象,所以可以很容易地覆盖不在基R中的常见函数(因为常见的包可能使用短函数名但不知道哪个包是加载没有办法检查它)。诸如内置逻辑T和F之类的对象也会带来麻烦。

我想到的一些例子是:

一个字母

  • C
  • t
  • T / F
  • Ĵ

两个字母

  • DF

更好的解决方案可能是避免使用简短的名称来支持更具描述性的名称,我通常会尝试将其作为习惯问题。然而,操作通用data.frame的函数的“df”是充分描述性的,而较长的名称几乎没有增加,因此短名称有其用途。此外,对于那些不一定知道更大背景的SO问题,提出描述性名称几乎是不可能的。

其他单字母和双字母变量名称与现有R对象有什么冲突?其中哪些足够常见以至于应该避免它们?如果他们不在base,请列出包裹。最好的答案至少涉及一些代码;如果使用,请提供。

请注意,我不会询问是否覆盖已存在的功能是否可取。这个问题已在SO上得到解决:

In R, what exactly is the problem with having variables with the same name as base R functions?

有关此处某些答案的可视化,请在简历中查看此问题:

https://stats.stackexchange.com/questions/13999/visualizing-2-letter-combinations

2 个答案:

答案 0 :(得分:18)

apropos非常适合:

apropos("^[[:alpha:]]{1,2}$")

如果没有加载包,则返回:

 [1] "ar" "as" "by" "c"  "C"  "cm" "D"  "de" "df" "dt" "el" "F"  "gc" "gl"
[15] "I"  "if" "Im" "is" "lh" "lm" "ls" "pf" "pi" "pt" "q"  "qf" "qr" "qt"
[29] "Re" "rf" "rm" "rt" "sd" "t"  "T"  "ts" "vi"

具体内容取决于搜索列表。如果您关心与常用软件包的冲突,请尝试加载一些软件包并重新运行它。


我装载了我的机器上安装的所有(> 200)软件包:

lapply(rownames(installed.packages()), require, character.only = TRUE)

重新调用apropos,将其包含在unique中,因为有一些重复项。

one_or_two <- unique(apropos("^[[:alpha:]]{1,2}$"))

这返回:

  [1] "Ad" "am" "ar" "as" "bc" "bd" "bp" "br" "BR" "bs" "by" "c"  "C" 
 [14] "cc" "cd" "ch" "ci" "CJ" "ck" "Cl" "cm" "cn" "cq" "cs" "Cs" "cv"
 [27] "d"  "D"  "dc" "dd" "de" "df" "dg" "dn" "do" "ds" "dt" "e"  "E" 
 [40] "el" "ES" "F"  "FF" "fn" "gc" "gl" "go" "H"  "Hi" "hm" "I"  "ic"
 [53] "id" "ID" "if" "IJ" "Im" "In" "ip" "is" "J"  "lh" "ll" "lm" "lo"
 [66] "Lo" "ls" "lu" "m"  "MH" "mn" "ms" "N"  "nc" "nd" "nn" "ns" "on"
 [79] "Op" "P"  "pa" "pf" "pi" "Pi" "pm" "pp" "ps" "pt" "q"  "qf" "qq"
 [92] "qr" "qt" "r"  "Re" "rf" "rk" "rl" "rm" "rt" "s"  "sc" "sd" "SJ"
[105] "sn" "sp" "ss" "t"  "T"  "te" "tr" "ts" "tt" "tz" "ug" "UG" "UN"
[118] "V"  "VA" "Vd" "vi" "Vo" "w"  "W"  "y"

你可以看到他们来自哪里

lapply(one_or_two, find)

答案 1 :(得分:4)

更多地考虑这个问题。这是基数R中的单字母对象名称列表:

> var.names <- c(letters,LETTERS)
> var.names[sapply(var.names,exists)]
[1] "c" "q" "t" "C" "D" "F" "I" "T" "X"

基础R中的单字母和双字母对象名称:

one.letter.names <- c(letters,LETTERS)

N <- length(one.letter.names)


first <- rep(one.letter.names,N)
second <- rep(one.letter.names,each=N)

two.letter.names <- paste(first,second,sep="")

var.names <- c(one.letter.names,two.letter.names)

> var.names[sapply(var.names,exists)]
[1] "c"  "d"  "q"  "t"  "C"  "D"  "F"  "I"  "J"  "N"  "T"  "X"  "bc" "gc"
[15] "id" "sd" "de" "Re" "df" "if" "pf" "qf" "rf" "lh" "pi" "vi" "el" "gl"
[29] "ll" "cm" "lm" "rm" "Im" "sp" "qq" "ar" "qr" "tr" "as" "bs" "is" "ls"
[43] "ns" "ps" "ts" "dt" "pt" "qt" "rt" "tt" "by" "VA" "UN"

这是一个比我最初怀疑的更大的列表,虽然我永远不会想到命名变量“if”,所以在某种程度上它是有道理的。

仍然不捕获不在base中的对象名称,或者给出任何最好避免哪些函数的意义。我认为更好的答案是使用专家意见来确定哪些函数很重要(例如使用c可能比使用qf更糟)或者在一堆R代码上使用数据挖掘方法来查看什么短命名的函数最常用。