我正在尝试将文档拆分为段落,然后将段落拆分为行。然后检查线条并打印段落。
虽然我可以使用下面的代码实现这一点,但是当我尝试对多个文档执行相同操作时,会出现一些“预期的字符串或缓冲区”错误。
with io.open(input_path, mode='r') as f, io.open(write_path, mode='w') as f2:
data = f.read()
splat = re.split(r"\n(\s)*\n", data)
mylist=[]
for para1 in splat:
splat2= re.split(r"\n", para1)
for line1 in splat2:
PERFORM SOME OPERATION
错误
<ipython-input-218-18e633df1d46> in custom_section(input_path, write_path)
14 mylist=[]
15 for para1 in splat:
---> 16 splat2= re.split(r"\n", para1)
17 for line1 in splat2:
18 # line1 = line1.decode("utf-8")
/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.pyc in split(pattern, string, maxsplit, flags)
169 """Split the source string by the occurrences of the pattern,
170 returning a list containing the resulting substrings."""
--> 171 return _compile(pattern, flags).split(string, maxsplit)
172
173 def findall(pattern, string, flags=0):
TypeError: expected string or buffer
答案 0 :(得分:0)
我相信发生此错误是因为变量splat
返回的字符串列表包含一个或多个None对象。如果您坚持使用re.split(),则可以使用filter()
函数删除None对象,如下所示:filter(None, splat)
。