如何更新列名而不在R中输入字符串

时间:2018-03-16 07:39:17

标签: r string dataframe

我正在使用WHO macro将人体测量参数转换为Z分数。

出于问题的目的,调用who2007函数要求我们提供数据框的名称,然后只提供变量(列)的名称,就像在< em> ggplot 功能。这样做的问题是,如果列名Age输入argument=Age与输入argument='Age'不同。前者返回double,但后者返回list。我假设这是df$Agedf['Age']的区别。

如果我有一个只有列名的向量,我需要每次使用不同的列迭代相同的代码,如果我按顺序输入该字符向量的相应条目,该函数会因为遇到列表而抛出错误而不是内部的双重。我该如何规避这个?我能想到的一种方法是使用列号或使用任何grep方法来识别列号,但还有另一种更好的方法吗?

附录

这是函数源代码(我认为可以解释问题的一部分)

who2007 <- function(FileLab="Temp",FilePath="C:\\Documents and Settings",mydf,sex,age,weight,height,oedema=rep("n",dim(mydf)[1]),sw=rep(1,dim(mydf)[1])) {

#############################################################################
###########   Calculating the z-scores for all indicators
#############################################################################

   old <- options(warn=(-1))

   sex.x<-as.character(get(deparse(substitute(mydf)))[,deparse(substitute(sex))])
   age.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(age))])
   weight.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(weight))])
   height.x<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(height))])
   if(!missing(oedema)) oedema.vec<-as.character(get(deparse(substitute(mydf)))[,deparse(substitute(oedema))]) else oedema.vec<-oedema
   if(!missing(sw)) sw<-as.double(get(deparse(substitute(mydf)))[,deparse(substitute(sw))]) else sw<-as.double(sw)
   sw<-ifelse(is.na(sw),0,sw)

    sex.vec<-NULL
   sex.vec<-ifelse(sex.x!="NA" & (sex.x=="m" | sex.x=="M" | sex.x=="1"),1,ifelse(sex.x!="NA" & (sex.x=="f" | sex.x=="F" | sex.x=="2"),2,NA))
    age.vec<-age.x
    height.vec<-height.x
   oedema.vec<-ifelse(oedema.vec=="n" | oedema.vec=="N","n",ifelse(oedema.vec=="y" | oedema.vec=="Y","y","n"))

   mat<-cbind.data.frame(age.x,as.double(sex.vec),weight.x,height.x,oedema.vec,sw,stringsAsFactors=F)
    names(mat)<-c("age.mo","sex","weight","height","oedema","sw")

    mat$cbmi<-mat$weight/((height.vec/100)^2)
    mat$zhfa<-NULL
    mat$fhfa<-NULL
    mat$zwfa<-NULL
    mat$fwfa<-NULL
    mat$zbfa<-NULL
    mat$fbfa<-NULL

#############################################################################
###########   Calculating the z-scores for all indicators
#############################################################################

cat("Please wait while calculating z-scores...\n") 

### Height-for-age z-score

mat<-calc.zhfa(mat,hfawho2007)

### Weight-for-age z-score

mat<-calc.zwei(mat,wfawho2007)

### BMI-for-age z-score

mat<-calc.zbmi(mat,bfawho2007)


#### Rounding the z-scores to two decimals

            mat$zhfa<-rounde(mat$zhfa,digits=2)
            mat$zwfa<-rounde(mat$zwfa,digits=2)
            mat$zbfa<-rounde(mat$zbfa,digits=2)

#### Flagging z-score values for individual indicators

            mat$fhfa<-ifelse(abs(mat$zhfa) > 6,1,0)
            mat$fwfa<-ifelse(mat$zwfa > 5 | mat$zwfa < (-6),1,0)
            mat$fbfa<-ifelse(abs(mat$zbfa) > 5,1,0)

if(is.na(mat$age.mo) & mat$oedema=="y") {
mat$fhfa<-NA
mat$zwfa<-NA
mat$zbfa<-NA
}

mat<-cbind.data.frame(mydf,mat[,-c(2:6)])

ADDENDUM 2

该脚本也可以由多个用户运行,在这些用户中可能无法修改它们的源代码。有没有办法不需要修改函数源代码?

1 个答案:

答案 0 :(得分:1)

我们可以测试输入 dataframe 是否需要列,然后摆脱“deparse get”步骤,例如:

who2007 <- function(FileLab = "Temp", FilePath = "C:\\Documents and Settings",
                    mydf,
                    oedema = rep("n",dim(mydf)[1]),sw=rep(1,dim(mydf)[1])) {

  if(!all(c("sex", "age", "weight", "height") %in% colnames(mydf))) stop("mydf, must have 'sex', 'age', 'weight', 'height' columns")

  sex.x <- mydf$sex
  age.x <- mydf$age
  # ...
  # some code
  # ...

  #return
  list(sex.x, age.x)
}

测试:

#example dataframe   
x <- head(mtcars)

# this errors as required columns are missing
who2007(mydf = x)
# Error in who2007(mydf = x) : 
#   mydf, must have 'sex', 'age', 'weight', 'height' columns

# now update columns with required column names, and it works fine:
colnames(x)[1:4] <- c("sex", "age", "weight", "height")
who2007(mydf = x)
# [[1]]
# [1] 21.0 21.0 22.8 21.4 18.7 18.1
# 
# [[2]]
# [1] 6 6 4 6 8 6