用多个分隔符分割列表

时间:2019-03-11 14:19:42

标签: python list

如果我有这样的感觉:

text = "The sun shine brightly, but is very cold today!"

我可以使用拆分:

newArray = text.split(" ")
print (newArray)   

结束结果将为:

['The', 'sun', 'shine', 'brightly,', 'but', 'is', 'very', 'cold', 'today!']

但是,如果我不仅需要用“空格”分开,还需要用“空格”,“逗号”和“ Enter”分开。

我该怎么做?

更清楚地说,这是我的代码示例:

import io
from pdfminer.converter import TextConverter
from pdfminer.pdfinterp import PDFPageInterpreter
from pdfminer.pdfinterp import PDFResourceManager
from pdfminer.pdfpage import PDFPage
import re

def extract_text_from_pdf(pdf_path):
    resource_manager = PDFResourceManager()
    fake_file_handle = io.StringIO()
    converter = TextConverter(resource_manager, fake_file_handle)
    page_interpreter = PDFPageInterpreter(resource_manager, converter)
    with open(pdf_path, 'rb') as fh:
        for page in PDFPage.get_pages(fh, 
                                      caching=True,
                                      check_extractable=True):
            page_interpreter.process_page(page)
        text = fake_file_handle.getvalue()
    # close open handles
    converter.close()
    fake_file_handle.close()
    if text:
        return text


text = extract_text_from_pdf('file.pdf')
newArray = text.split(" ")
print (newArray)   

3 个答案:

答案 0 :(得分:3)

您可以使用re.split来按多个条件划分:

text = "The sun shine brightly, but is very cold today!"

说您要按空格和逗号分隔:

import re
re.split( r'\s+|,\s*', text)
# ['The', 'sun', 'shine', 'brightly', 'but', 'is', 'very', 'cold', 'today!']

答案 1 :(得分:3)

最简单的方法可能是规范化数据,并用空格替换所有“逗号”和“输入”,然后像以前一样拆分,或将data.data.forEach(elem => { console.log('foreach', elem.title); }); 中的split()与{ {1}}元。

答案 2 :(得分:2)

str.split()方法在空白处分割并转换为数组:

>>> import re
>>> s = "The sun shine brightly, but is very cold today!"
>>> re.findall( r'\s+|,\s*', s)
['The', 'sun', 'shine', 'brightly', 'but', 'is', 'very', 'cold', 'today!']

希望大家使用它。