我正在尝试执行以下操作。
-
**Job Title,Department**
"443.ENGINEER IV - INFORMATION SECURITY","INFORMATION SECURITY"
"443.MANAGER - INFORMATION SECURITY","INFORMATION SECURITY"
"443.SENIOR THREAT INTELLIGENCE MANAGER","INFORMATION SECURITY"
"443.SR ENGINEER - INFORMATION SECURITY","INFORMATION SECURITY"
"443.SR MANAGER - INFORMATION SECURITY","INFORMATION SECURITY"
"543.ENGINEER III - INFRASTRUCTURE","RELATIONAL LAB"
"543.MANAGER - SOFTWARE DEVELOPMENT","RELATIONAL LAB"
"543.SR ENGINEER - DEVELOPMENT","RELATIONAL LAB"
"543.SR ENGINEER - INFRASTRUCTURE","RELATIONAL LAB"
"640.SVP - ARCHITECTURE & TECH SERVICES","ASSET MANAGEMENT"
"643.CORPORATE PROGRAMS PROJECT MANAGER III","CORPORATE INFORMATION SERVICES"
"643.DIRECTOR - CIS PROGRAMS","CORPORATE INFORMATION SERVICES"
"643.ENGINEER III - SECURITY ANALYST","PHYSICAL SECURITY"
"643.OPERATIONS ANALYST IV","DATA CENTER SERVICES"
"643.PROJECT MANAGER IV","CORPORATE INFORMATION SERVICES"
"643.PROJECT MANAGER VI","CORPORATE INFORMATION SERVICES"
"643.SR MANAGER - SECURITY","PHYSICAL SECURITY"
"643.TECHNICAL PROJECT MANAGER III","CORPORATE INFORMATION SERVICES"
"743.ASSET MGMT ANALYST III","DATA CENTER SERVICES"
"743.ASSET MGMT ANALYST IV","DATA CENTER SERVICES"
"743.BUSINESS OPERATIONS ANALYST III","DATA CENTER SERVICES"
"743.DIRECTOR - DATA CENTER OPERATIONS","DATA CENTER SERVICES"
"743.ENGINEER II - DATA CENTER OPS","DATA CENTER SERVICES"
"743.ENGINEER II - TECHNICAL OPERATIONS","DATA CENTER SERVICES"
"743.ENGINEER III - DATA CENTER OPS","DATA CENTER SERVICES"
"743.ENGINEER III - TECHNICAL OPERATIONS","DATA CENTER SERVICES"
将作业标题解析为以下内容 - 具体取决于上面的格式
一个。 (工作代码)。(职称) - 小组 湾(工作代码)。(职称)
我想将它们分成我的词典中的单独条目(其中四个)
一个。工作代码 湾职称 C。组+部门 d。部门
我甚至无法使正则表达式匹配,我已经尝试了正则表达式工具并查看了之前的问题,没有运气。我把我的代码放在下面..
下面是相关部分,我无法弄清楚为什么正则表达式不匹配..
“643.PROJECT MANAGER VI”,“企业信息服务” “643.SR MANAGER - SECURITY”,“物理安全” “643.TECHNICAL PROJECT MANAGER III”,“企业信息服务”
m = re.search(“\ d +。\ D + - \ D +”,string])
**它应匹配上述文件中第一个字段中的所有值。
答案 0 :(得分:1)
你的正则表达式如下:
"\d+\.\D+-\D+"
我不确定你想要匹配的字符串,但显然它是其中之一:
第三个匹配,因此无法成为您抱怨的人。
偶数的人显然不会匹配,因为他们不会以数字开头。
所以我认为#1或#5令你感到惊讶。他们都没有-
,所以他们不会匹配。
一些附注:
-
是\D
课程的成员,如果您不使用?
来改变贪婪,这可能会导致混淆。
你真的应该使用原始字符串(或者如果你愿意,可以逃避反斜杠); d
和D
恰好不在当前反斜杠转义字符集中的事实不是你想要依赖的东西(特别是如果你想要其他人,可能没有记住那个列表,阅读你的代码。)
如果你试图将这些东西分成不同的部分,你就不想匹配整个事情,你想要添加捕获组。
另外,我假设空格不应该是解析字符串的一部分,对吗?
所以,你的正则表达应该是这样的:
r"(\d+)\.(\D+?)\s*-\s*(\D+)"
例如:
>>> s = "643.SR MANAGER - SECURITY"
>>> m = re.search(r"(\d+)\.(\D+?)\s*-\s*(\D+)", s)
>>> print(m.groups())
('643', 'SR MANAGER', 'SECURITY')