在python中拆分时出现预期的字符串或缓冲区错误

时间:2018-04-17 05:43:24

标签: python regex

我正在尝试将文档拆分为段落,然后将段落拆分为行。然后检查线条并打印段落。

虽然我可以使用下面的代码实现这一点,但是当我尝试对多个文档执行相同操作时,会出现一些“预期的字符串或缓冲区”错误。

with io.open(input_path, mode='r') as f, io.open(write_path, mode='w') as f2:
    data = f.read()
    splat = re.split(r"\n(\s)*\n", data)
    mylist=[]
    for para1 in splat:
        splat2= re.split(r"\n", para1)
        for line1 in splat2:
           PERFORM SOME OPERATION

错误

<ipython-input-218-18e633df1d46> in custom_section(input_path, write_path)
     14         mylist=[]
     15         for para1 in splat:
---> 16             splat2= re.split(r"\n", para1)
     17             for line1 in splat2:
     18 #                 line1 = line1.decode("utf-8")

/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.pyc in split(pattern, string, maxsplit, flags)
    169     """Split the source string by the occurrences of the pattern,
    170     returning a list containing the resulting substrings."""
--> 171     return _compile(pattern, flags).split(string, maxsplit)
    172 
    173 def findall(pattern, string, flags=0):

TypeError: expected string or buffer

1 个答案:

答案 0 :(得分:0)

我相信发生此错误是因为变量splat返回的字符串列表包含一个或多个None对象。如果您坚持使用re.split(),则可以使用filter()函数删除None对象,如下所示:filter(None, splat)