正则表达式搜索/匹配Python CSV

时间:2014-08-03 05:20:25

标签: python regex

我正在尝试执行以下操作。

  1. 在包含多个字段的csv文件中读取(我已经在下面放了一个文件的副本,只有几个字段
  2. -

    **Job Title,Department**
    "443.ENGINEER IV - INFORMATION SECURITY","INFORMATION SECURITY"
    "443.MANAGER - INFORMATION SECURITY","INFORMATION SECURITY"
    "443.SENIOR THREAT INTELLIGENCE MANAGER","INFORMATION SECURITY"
    "443.SR ENGINEER - INFORMATION SECURITY","INFORMATION SECURITY"
    "443.SR MANAGER - INFORMATION SECURITY","INFORMATION SECURITY"
    "543.ENGINEER III - INFRASTRUCTURE","RELATIONAL LAB"
    "543.MANAGER - SOFTWARE DEVELOPMENT","RELATIONAL LAB"
    "543.SR ENGINEER - DEVELOPMENT","RELATIONAL LAB"
    "543.SR ENGINEER - INFRASTRUCTURE","RELATIONAL LAB"
    "640.SVP - ARCHITECTURE & TECH SERVICES","ASSET MANAGEMENT"
    "643.CORPORATE PROGRAMS PROJECT MANAGER III","CORPORATE INFORMATION SERVICES"
    "643.DIRECTOR - CIS PROGRAMS","CORPORATE INFORMATION SERVICES"
    "643.ENGINEER III - SECURITY ANALYST","PHYSICAL SECURITY"
    "643.OPERATIONS ANALYST IV","DATA CENTER SERVICES"
    "643.PROJECT MANAGER IV","CORPORATE INFORMATION SERVICES"
    "643.PROJECT MANAGER VI","CORPORATE INFORMATION SERVICES"
    "643.SR MANAGER - SECURITY","PHYSICAL SECURITY"
    "643.TECHNICAL PROJECT MANAGER III","CORPORATE INFORMATION SERVICES"
    "743.ASSET MGMT ANALYST III","DATA CENTER SERVICES"
    "743.ASSET MGMT ANALYST IV","DATA CENTER SERVICES"
    "743.BUSINESS OPERATIONS ANALYST III","DATA CENTER SERVICES"
    "743.DIRECTOR - DATA CENTER OPERATIONS","DATA CENTER SERVICES"
    "743.ENGINEER II - DATA CENTER OPS","DATA CENTER SERVICES"
    "743.ENGINEER II - TECHNICAL OPERATIONS","DATA CENTER SERVICES"
    "743.ENGINEER III - DATA CENTER OPS","DATA CENTER SERVICES"
    "743.ENGINEER III - TECHNICAL OPERATIONS","DATA CENTER SERVICES"
    
    1. 将作业标题解析为以下内容 - 具体取决于上面的格式

      一个。 (工作代码)。(职称) - 小组 湾(工作代码)。(职称)

    2. 我想将它们分成我的词典中的单独条目(其中四个)

      一个。工作代码 湾职称 C。组+部门 d。部门

    3. 我甚至无法使正则表达式匹配,我已经尝试了正则表达式工具并查看了之前的问题,没有运气。我把我的代码放在下面..

      下面是相关部分,我无法弄清楚为什么正则表达式不匹配..

      “643.PROJECT MANAGER VI”,“企业信息服务” “643.SR MANAGER - SECURITY”,“物理安全” “643.TECHNICAL PROJECT MANAGER III”,“企业信息服务”

      m = re.search(“\ d +。\ D + - \ D +”,string])

      **它应匹配上述文件中第一个字段中的所有值。

1 个答案:

答案 0 :(得分:1)

你的正则表达式如下:

"\d+\.\D+-\D+"

我不确定你想要匹配的字符串,但显然它是其中之一:

  1. " 643.PROJECT MANAGER VI"
  2. "公司信息服务"
  3. " 643.SR MANAGER - SECURITY"
  4. "物理安全"
  5. " 643.技术项目经理III",
  6. "公司信息服务"
  7. 第三个匹配,因此无法成为您抱怨的人。

    偶数的人显然不会匹配,因为他们不会以数字开头。

    所以我认为#1或#5令你感到惊讶。他们都没有-,所以他们不会匹配。


    一些附注:

    -\D课程的成员,如果您不使用?来改变贪婪,这可能会导致混淆。

    你真的应该使用原始字符串(或者如果你愿意,可以逃避反斜杠); dD恰好不在当前反斜杠转义字符集中的事实不是你想要依赖的东西(特别是如果你想要其他人,可能没有记住那个列表,阅读你的代码。)

    如果你试图将这些东西分成不同的部分,你就不想匹配整个事情,你想要添加捕获组。

    另外,我假设空格不应该是解析字符串的一部分,对吗?

    所以,你的正则表达应该是这样的:

    r"(\d+)\.(\D+?)\s*-\s*(\D+)"
    

    例如:

    >>> s = "643.SR MANAGER - SECURITY"
    >>> m = re.search(r"(\d+)\.(\D+?)\s*-\s*(\D+)", s)
    >>> print(m.groups())
    ('643', 'SR MANAGER', 'SECURITY')