在多个条件下拆分文本

时间:2017-09-08 19:43:53

标签: python-3.x text

我想根据多个条件分割文本字符串。我想在确定的项目之前采取所有文本。各个标题之间可能有多个空格,而不仅仅是这里所示的一个空格,并且也希望能够处理这个空格。

有两个问题:

  1. 循环多个标题(此处未指明)
  2. ,它们之间可能有不同的空格
  3. 我尝试了以下内容:

    job_titles = ['senior payroll specialist', 'employment coordinator']
    
    import re 
    string = 'some text that has a bunch of words in it Blank Name senior payroll specialist 
    with a bunch of words after this that are not needed'
    out = re.split('senior payroll specialist', string)[0]
    out = re.split('senior payroll specialist', out)[0]
    

    谢谢

1 个答案:

答案 0 :(得分:0)

或许考虑将您的拆分字符串组合成一个正则表达式。例如:

bash-3.2$ python3
Python 3.6.2 (default, Jul 17 2017, 16:44:32) 
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.

>>> job_titles = ['senior payroll specialist', 'employment coordinator']
>>> string = ('some text that has a bunch of words in it '
... 'Blank Name senior payroll specialist with the words '
... 'employment coordinator and words after this that are not needed')

>>> import re, pprint
>>> pat = "(" + "|".join(job_titles) + ")"
>>> pprint.pprint( re.split( pat, string ))
['some text that has a bunch of words in it Blank Name ',
 'senior payroll specialist',
 ' with the words ',
 'employment coordinator',
 ' and words after this that are not needed']
>>>