Question

我经常准备工作中共享的统计摘要表。这些表通常包含相同类型的数据和列标题（例如，违反规章的数量，单位的数量等）。我经常使用R数据框中的速记列名（“ nbbldg”，“ nbunits”，“ nbvl”）或从导入表继承的其他列名。这是一个示例：

df <-
  data.frame(
    DESCRIPTION_TXT_BLW = c(
      "Missing plumbing fixture",
      "Improperly installed heating unit",
      "Loose or damaged siding",
      "Peeling paint"
    ),
    DESCR_UNIT = c("Apartment", "Apartment", "Common area", "Common area"),
    nbvl = as.integer(c(12, 4, 76, 4))
  )

然后在通过以下功能（提供示例列表）导出到csv之前，将列名转换为它们的“可读”对应项：

changecolnames<-function (df, codetotext) 
{
  lapply(names(df), function(x) {
    if (x %in% names(codetotext)) {
      codetotext[[x]]
    }
    else {
      x
    }
  })
}

readablecolnames <-
      list(
        "DESCR_UNIT" = "Description of unit",
        "DESCRIPTION_TXT_BLW" = "Description of bylaw violation",
        "nbvl" = "Number of bylaw violations"
      )

names(df)<-changecolnames(df, readablecolnames)

到目前为止，我有特定于项目的列表，这些列表使我可以转换列名称。我想将完全不同的列表聚合到一个可以从任何R项目（在RStudio中）访问的全局列表中，并继续添加到其中。我的目标是避免在每个项目中创建一个列表，而是引用一种易于更新的主“库”。实现此目标的最佳方法是什么？

Answer 1

我要做的是拥有一个中央R文件，其中包含此名称列表，然后source将其加载到每个项目中。

如果您希望将名称对保留在.csv文件中，则此R文件可以从单个文件生成名称列表，而不必自己保存：

name_pairs.csv：

short_name,full_name
DESCR_UNIT,Description of unit
DESCRIPTION_TXT_BLW,Description of bylaw violation
nbvl,Number of bylaw violations

load_name_pairs.R：

name_pairs <- read.table('~/Desktop/test/name_pairs.csv', sep = ',',
                         header = TRUE, stringsAsFactors = FALSE)

readablecolnames <- name_pairs$full_name
names(readablecolnames) <- name_pairs$short_name
rm(name_pairs)

R项目开始时：

source('~/Desktop/test/load_name_pairs.r')
readablecolnames


           DESCR_UNIT              DESCRIPTION_TXT_BLW                             nbvl 
"Description of unit" "Description of bylaw violation"     "Number of bylaw violations"

如您所见，通过在source上使用load_name_pairs.r，将运行sourced文件中的所有代码，并将对象移植到sourcing环境中。因此，在项目文件中仅需一行，就可以加载和解析中央.csv文件，并访问项目中的结果。

使用不断变化的列表重命名R数据框中的列的功能

1 个答案: