我有一个数据框:
dput(Data1)
structure(list(Emp.ID = c(182038L, 191854L), Project.Acquired.Skill = structure(c(2L,
1L), .Label = c("Architecting (10),Cognos TM1 (4),Support Function (3)",
"SAS (76),SAS Analytics (76),SAS BI (76),SAS data modeling tool (63),ClearCase (18),SQL (18),SQL Server (18),SQL SERVER 2000 (18),SQL SERVER 2005 (18),Excel (16),Oracle (16),AS400 (10)"
), class = "factor")), .Names = c("Emp.ID", "Project.Acquired.Skill"
), class = "data.frame", row.names = c(NA, -2L))
str(Data1)
'data.frame': 2 obs. of 2 variables:
$ Emp.ID : int 182038 191854
$ Project.Acquired.Skill: Factor w/ 2 levels "Architecting (10),Cognos TM1 (4),Support Function (3)",..: 2 1
我有一个列,它是一个像Architecting (10),Cognos TM1 (4),Support Function (3)
这样的值的因子,我需要删除数字(0-9),WhiteSpace和方括号()以得到Architecting,Cognos TM1,Support Function
。我面临的问题是因为这被编码为因素
我的输出应该是这样的
Emp ID Project Acquired Skill
182038 SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400
191854 Architecting,Cognos TM1,Support Function
答案 0 :(得分:2)
在gsub
中使用字符类正则表达式:
transform(Data1, Project.Acquired.Skill=gsub("\\s[0-9()]+","",Project.Acquired.Skill))
Emp.ID
1 182038
2 191854
Project.Acquired.Skill
1 SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER,SQL SERVER,Excel,Oracle,AS400
2 Architecting,Cognos TM1,Support Function
答案 1 :(得分:2)
(data1[,2] <- gsub("\\s\\(\\d+\\)", "", data1[,2]))
# [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER 2000,SQL SERVER 2005,Excel,Oracle,AS400"
# [2] "Architecting,Cognos TM1,Support Function"
答案 2 :(得分:1)
library(qdap)
gsub(" ,", ",", strip(Data1[, 2], char.keep=",", lower=F))
## [1] "SAS,SAS Analytics,SAS BI,SAS data modeling tool,ClearCase,SQL,SQL Server,SQL SERVER ,SQL SERVER ,Excel,Oracle,AS"
## [2] "Architecting,Cognos TM,Support Function"