最近,我开始学习R,并尝试通过自动化过程来探索更多内容。下面是示例数据,我正在尝试通过查找并替换标签(商品名:名称)中的特定文本来创建新列。
从那时起,我正在处理大量新数据,我想使用R编程而不是使用excel公式来自动化。
数据集:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager","Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
我使用的R代码:
t<-data.frame(strings,stringsAsFactors = FALSE)
colnames(t)[1]<-"Designations"
y<-sub(".*Manager*","Manager",strings,ignore.case = TRUE)
挑战:
在此过程中,所有数据都更改为Manager,但我需要用主要主题替换其他名称。
我尝试使用ifelse语句,grep,grepl,str,sub等,但是我没有得到想要的东西
由于主要主题分散,我不能使用第一/第二/最后一个词(作为“定界”)。例如:首席信息官或商业财务经理或股东周年大会
Excel工作:
我已经将300个主要主题编码为...
经理(适用于所有总经理,助理经理,销售经理等) 建筑师(Solution Arch,Sr。Arch等) 主任(高级主任,主任,助理主任等) 资深分析师 分析员 主管(代表销售主管)
我要寻找的是: 我需要创建一个新列,并且应该像在Excel中使用R一样用相关的主要主题替换文本。
如果我可以将我已经在excel中编码的主要主题与使用R编程(例如excel中的vlookup)相匹配的主题,那就可以了。
预期结果: enter image description here 预先感谢您的帮助!
是的,我正在处理的完全相同。谢谢!!但是当我通过上传新数据集(excel文件)并使用
df %>%
mutate(theme=gsub(".*(Manager|Lead|Director|Head|Administrator|Executive|Executive|VP|President|Consultant|CFO|CTO|CEO|CMO|CDO|CIO|COO|Cheif Executive Officer|Chief Technological Officer|Chief Digital Officer|Chief Financial Officer|Chief Marketing Officer|Chief Digital Officer|Chief Information Officer,Chief Operations Officer)).*","\\1",Designations,ignore.case = TRUE))
它没有用。我应该在其他地方纠正吗?
答案 0 :(得分:2)
数据:
strings<-c("Zonal Manager","Department Manager","Network Manager","Head of Sales","Account Manager",
"Alliance Manager","Additional Manager","Senior Vice President","General manager","Senior Analyst", "Solution Architect","AGM")
您需要准备一个良好的查找表:(完成并使其完美。)
lu_table <- data.frame(new = c("Manager", "Architect","Director"), old = c("Manager|GM","Architect|Arch","Director"), stringsAsFactors = F)
然后,您可以让mapply完成这项工作:
mapply(function(new,old) {ans <- strings; ans[grepl(old,ans)]<-new; strings <<- ans; return(NULL)}, new = lu_table$new, old = lu_table$old)
现在看看strings
:
> strings
[1] "Manager" "Manager" "Manager" "Head of Sales" "Manager" "Manager"
[7] "Manager" "Senior Vice President" "General manager" "Senior Analyst" "Architect" "Manager"
请注意:
此解决方案使用<<-
。因此,这可能不是最好的解决方案。但是在这种情况下有效。
答案 1 :(得分:1)
您的意思是这样的吗?
library(dplyr)
strings <-
c(
"Zonal Manager",
"Department Manager",
"Network Manager",
"Head of Sales",
"Account Manager",
"Alliance Manager",
"Additional Manager",
"Senior Vice President",
"General manager",
"Senior Analyst",
"Solution Architect",
"AGM"
)
df = data.frame(Designations = strings)
df %>%
mutate(
theme = gsub(
".*(manager|head|analyst|architect|agm|director|president).*",
"\\1",
Designations,
ignore.case = TRUE
)
)
#> Designations theme
#> 1 Zonal Manager Manager
#> 2 Department Manager Manager
#> 3 Network Manager Manager
#> 4 Head of Sales Head
#> 5 Account Manager Manager
#> 6 Alliance Manager Manager
#> 7 Additional Manager Manager
#> 8 Senior Vice President President
#> 9 General manager manager
#> 10 Senior Analyst Analyst
#> 11 Solution Architect Architect
#> 12 AGM AGM
由reprex package(v0.2.1)于2018-10-04创建