数据框中行名称的部分字符串替换

时间:2017-03-06 02:18:08

标签: r string-substitution rowname

我的问题更多的是提高我的编码技巧而不是解决问题,因为我能够找到解决方案,但我发现它并不优雅。

我正在处理发布here的更复杂版本。我正在运行多个线性回归,我想将所有系数中的系数导出到单个csv文件中。我能够使用this信息生成所有系数的列表并将其转换为数据帧列表。我的数据框列表如下所示:

> coef.df
[[1]]
                    Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.08670899   0.357377 -0.2426261 0.8082950694
Var.0.0.Type.4   22.46262205   5.935317  3.7845698 0.0001539747

[[2]]
                   Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.1682616  0.3590799 -0.4685911 6.393619e-01
Var.0.5.Type.4   15.4974199  3.8693290  4.0051957 6.196616e-05

[[3]]
                   Estimate Std. Error    z value     Pr(>|z|)
(Intercept)      -0.1832488  0.3532577 -0.5187397 6.039423e-01
Var.1.0.Type.4   10.1225605  2.4475064  4.1358668 3.536172e-05

等等。

当我尝试将此列表简单地转换为csv文件时,我搞乱了列名(所有" Intercept"术语添加了一个数字)。

                   Estimate Std. Error     z value     Pr(>|z|)
(Intercept)      -0.08670899  0.3573770 -0.24262609 8.082951e-01
Deg.In.0.0.INS.4 22.46262205  5.9353171  3.78456983 1.539747e-04
(Intercept)1     -0.16826164  0.3590799 -0.46859114 6.393619e-01
Deg.In.0.5.INS.4 15.49741993  3.8693290  4.00519568 6.196616e-05
(Intercept)2     -0.18324877  0.3532577 -0.51873968 6.039423e-01
Deg.In.1.0.INS.4 10.12256045  2.4475064  4.13586682 3.536172e-05
(Intercept)3     -0.14188918  0.3426645 -0.41407607 6.788184e-01
Deg.In.1.5.INS.4  6.32348365  1.5164421  4.16994719 3.046702e-05

我知道行必须具有唯一的名称,我想使用每个模型的第二个系数的名称来自定义它们。我想要做的是创建一个csv文件,该文件将以下列格式包含所有信息,并调整行名称以考虑给定拦截的变量:

                          Estimate Std. Error    z value     Pr(>|z|)
(Intercept.0.0.Type.4)   -0.0867089   0.357377  -0.2426261 0.8082950694
Var.0.0.Type.4           22.4626220   5.935317   3.7845698 0.0001539747
(Intercept.0.5.Type.4)   -0.1682616   0.359079  -0.4685911 6.393619e-01
Var.0.5.Type.4           15.4974199   3.869329   4.0051957 6.196616e-05
(Intercept.1.0.Type.4)   -0.1832488   0.353257  -0.5187397 6.039423e-01
Var.1.0.Type.4           10.1225605   2.447506   4.1358668 3.536172e-05

我没有太多操作部分字符串替换的经验,虽然我能够这样做,但我认为我的代码不是最直接的。以下是我能够获得此结果的方法:

#I created a vector containing all row names
df.names <- unlist(lapply(coef.df,rownames)) 
> df.names
 [1] "(Intercept)" "Var.0.0.INS.4" "(Intercept)" "Var.0.5.INS.4" 
 [5] "(Intercept)" "Var.1.0.INS.4" "(Intercept)" "Var.1.5.INS.4" 
 [9] "(Intercept)" "Var.0.0.INS.5" "(Intercept)" "Var.0.5.INS.5"
[13] "(Intercept)" "Var.1.0.INS.5" "(Intercept)" "Var.1.5.INS.5"
#I created a vector with all "(Intercept)" elements from df.names
inter.lm <- df.names[c(TRUE, FALSE)] 
> inter.lm
[1] "(Intercept)" "(Intercept)" "(Intercept)" "(Intercept)" "(Intercept)"
[6] "(Intercept)" "(Intercept)" "(Intercept)"
#I created a vector with all remaining elements from df.names 
var.lm <- df.names[c(FALSE,TRUE)] coefficients
> var.lm
[1] "Var.0.0.Type.4" "Var.0.5.Type.4" "Var.1.0.Type.4" "Var.1.5.Type.4" 
[5] "Var.0.0.Type.5" "Var.0.5.Type.5" "Var.1.0.Type.5" "Var.1.5.Type.5"
#I removed the "Var" part from all elements in var.lm
var.temp <- gsub("Var(.*)", "\\1", var.lm)
> var.temp
[1] ".0.0.Type.4" ".0.5.Type.4" ".1.0.Type.4" ".1.5.Type.4" ".0.0.Type.5"
[6] ".0.5.Type.5" ".1.0.Type.5" ".1.5.Type.5"
#I removed the ")" part from all elements in inter.lm
inter.temp <- gsub("\\)", "", inter.lm) 
> inter.temp
[1] "(Intercept" "(Intercept" "(Intercept" "(Intercept" "(Intercept"
[6] "(Intercept" "(Intercept" "(Intercept"
#I pasted together vectors inter.tepm and var.temp to get the required names 
inter.new <- paste(inter.temp,var.temp,")",sep="")
> inter.new
[1] "(Intercept.0.0.Type.4)" "(Intercept.0.5.Type.4)" "(Intercept.1.0.Type.4)"   
[4] "(Intercept.1.5.Type.4)" "(Intercept.0.0.Type.5)" "(Intercept.0.5.Type.5)"
[7] "(Intercept.1.0.Type.5)" "(Intercept.1.5.Type.5)"
#I merged the inter.new and var.lm vectors to get the correct naming
df.names <- c(rbind(inter.new, var.lm))
> df.names
 [1] "(Intercept.0.0.Type.4)" "Deg.In.0.0.Type.4"     
 [3] "(Intercept.0.5.Type.4)" "Deg.In.0.5.Type.4"     
 [5] "(Intercept.1.0.Type.4)" "Deg.In.1.0.Type.4"     
 [7] "(Intercept.1.5.Type.4)" "Deg.In.1.5.Type.4"  
 [9] "(Intercept.0.0.INS.5)" "Deg.In.0.0.INS.5"     
[11] "(Intercept.0.5.INS.5)" "Deg.In.0.5.INS.5"     
[13] "(Intercept.1.0.INS.5)" "Deg.In.1.0.INS.5"     
[15] "(Intercept.1.5.INS.5)" "Deg.In.1.5.INS.5"     
#Finally I changed the row names
rownames(final.df) <- df.names

是否有更简单/更短的方法来获取我想要的名字?

0 个答案:

没有答案