使用python将文本解析为段落(循环问题)

时间:2019-04-18 17:29:18

标签: python loops

我正在使用Google Sheets API和python从电子表格中输入的数据生成HTML标记。有时用户会在一个单元格中输入较长的文本块,而我希望在出现新行时使用python将其解析为语义段落。

使用str.splitlines()和forloop可以使它在概念上起作用,但是将打印循环的第一次迭代。

#!/usr/bin/python

#sample text from spreadsheet
text = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""

#break intro text into paragraphs
def pgparse(text):
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    for i in lines:
        return '<p>'+i+'</p>'

print(pgparse(text))

结果:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>

预期结果:

<p>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.</p>
<p>It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

2 个答案:

答案 0 :(得分:3)

您只返回第一行。您的第二行永远不会被包裹。 试试这个:

#!/usr/bin/python

#sample text from spreadsheet
text = """Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."""

#break intro text into paragraphs
def pgparse(text):
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    return "\n".join('<p>'+i+'</p>' for i in lines)

print(pgparse(text))

使用生成器表达式来换行,然后使用\n

将它们重新连接起来

答案 1 :(得分:3)

return '<p>'+i+'</p>'

此行退出该功能。也许您想要:

def pgparse(text):
    result = []
    #split at every new line
    lines = text.splitlines()
    #wrap lines in p tags
    for i in lines:
        result.append('<p>'+i+'</p>')
    return result