我有简单数据框,其中包含开源软件版本的相关信息,如下所示:
> head(a, n=50)
Project ID Latest Release
1 14 dhiggen_merge-5.0
2 11 r2-00
3 2 Snapshots
4 70 1.90
5 72 2.5
6 30 AfterStep 2.00.beta5
7 38 1.0
8 7 gedit 0.9.5
9 92 1.0b
10 93 2001-11-19
11 68 1.9.97
12 15 3.0-RC8
13 47 3.23.52
14 3 7.5
15 12 0.9.7
16 19 2.0.5a
17 31 wm-session-hacks-0.1.0
18 75 1.16r6.1
19 16 udb-1.8-29
20 21 0.1
21 64 0.6.2
22 34 0.3.1
23 35
24 99 2.0.8
25 44 1.2.6.1
26 22 0.94.3
27 32 1.5.0
28 78 .92q
我编写了以下转换函数,以创建factor
类的新数据框列,以确定软件的成熟度,基于非常简单条件:
prjMaturity <- function (indicator, data) {
var <- data[["Latest Release"]]
rx <- "^(.*-)?([[:digit:]]+\\.)?([[:digit:]]+\\.)?(\\*|[[:digit:]]+)$"
major <- gsub(rx, "\\2", var)
major <- substr(major, 1, nchar(major)-1)
major <- as.numeric(major)
if (major > 0 && major < 1) maturity <- "Alpha/Beta"
if (major >= 1 && major <= 2) matirity <- "Stable"
if (major > 2) maturity <- "Mature"
data["Project Maturity"] <- as.factor(maturity)
if (DEBUG2) {print(summary(data)); message("")}
return (data)
}
但是,运行此代码会导致意外错误的结果以及警告:
Project ID Latest Release Project Maturity
Length:28 Length:28 Mature:28
Class1:avector Class1:avector
Class2:avector Class2:avector
Class3:character Class3:character
Mode :character Mode :character
Warning messages:
1: In (function (indicator, data) : NAs introduced by coercion
2: In if (major > 2) maturity <- "Mature" :
the condition has length > 1 and only the first element will be used
我做错了什么或错过了什么?谢谢!
答案 0 :(得分:1)
你可以使用?cut()
major
[1] 5.00 NA NA 1.00 2.00 NA 1.00 NA 1.00 NA 1.00 NA 3.00 7.00 0.00
[16] NA 0.00 NA NA 0.00 0.00 0.00 NA 2.00 NA 0.00 1.00 0.92
cut(major, breaks=c(0+0.01,1-0.01,2,Inf),include.lowest=TRUE,labels=c("Alpha/Beta","Stable","Mature"))
[1] Mature <NA> <NA> Stable Stable <NA>
[7] Stable <NA> Stable <NA> Stable <NA>
[13] Mature Mature <NA> <NA> <NA> <NA>
[19] <NA> <NA> <NA> <NA> <NA> Stable
[25] <NA> <NA> Stable Alpha/Beta
答案 1 :(得分:0)
最初发布为更新,然后决定发布我的回答,考虑到我所做的更改。
我改进了我的代码,包括处理现有各种数据的正则表达式:
prjMaturity <- function (indicator, data) {
# do not process, if target column (type) already exists
if (is.factor(data[["Project Maturity"]])) {
message("Project Maturity: ", appendLF = FALSE)
message("Not processing - Transformation already performed!\n")
return (invisible())
}
var <- data[["Latest Release"]]
rx <- "^([^[:digit:]]*)([[:digit:]]+)(\\.|-)+(.*)$"
major <- gsub(rx, "\\2", var)
major <- as.numeric(major)
data[["Project Maturity"]] <-
cut(major, breaks = c(0, 1, 2, Inf), include.lowest = TRUE,
right = FALSE, labels=c("Alpha/Beta", "Stable", "Mature"))
if (DEBUG2) {print(summary(data)); message("")}
return (data)
}