我有一个带有名字的数字向量var
(来自predict.cv.glmnet的输出)
var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00)
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum")
(Intercept) as.factor(holiday)1 as.factor(season)2 as.factor(season)3 as.factor(season)4 as.factor(weathersit)2
5.74 0.00 0.15 0.00 0.04 0.00
as.factor(weathersit)3 windspeed temp hum
0.00 0.00 1.81 0.00
我想提取具有非零值的变量名称以及汇总因子水平(即如果至少一个因子水平非零,则应包括整个因子。输出应省略因子水平。我正在寻找一段能够给我这个结果的代码:
"(Intercept)" "as.factor(season)" "temp"
我还有一个可用因子名fac
的变量:
fac<-c("as.factor(holiday)","as.factor(season)","as.factor(weathersit)")
"as.factor(holiday)" "as.factor(season)" "as.factor(weathersit)"
并且正在考虑在省略其级别的同时聚合具有相似名称的因素,并检查聚合因子的总和是否> 0但是我不能对其进行编码。
答案 0 :(得分:0)
我玩which
和正则表达式:
var<-c(5.74,0.00,0.15,0.00,0.04,0.00,0.00,0.00,1.81,0.00)
names(var)<- cbind("(Intercept)","as.factor(holiday)1","as.factor(season)2","as.factor(season)3","as.factor(season)4","as.factor(weathersit)2", "as.factor(weathersit)3","windspeed","temp","hum")
X <- names(var)[which(var!=0)]
n <- grep( "as[.]factor.*", X )
X[n] <- gsub( ")[0-9]+$", ")", X[n] )
X <- unique(X)
X
#[1] "(Intercept)" "as.factor(season)" "temp"
which
选择非零组件。
grep
用于查找因子的索引。然后gsub
删除因子级别。