使用lxml.builder

时间:2018-04-21 19:09:55

标签: python html python-3.x lxml

我需要使用lxml包生成HTML。以下示例main函数显示了我的工作方式:

def main():
    from lxml.builder import E

    p_persons = []
    person = ['1']  #counter
    person.append('ID')
    person.append('0. https://www.youtube.com/watch?v=qLsn5aNaVkI 1. https://www.youtube.com/watch?v=MPbO6P3Vtx8 2. https://www.youtube.com/watch?v=jVKWPaFuNng 3. https://www.youtube.com/watch?v=9HFyB4gCOqY 4. https://www.youtube.com/watch?v=muQGef4Df_8')
    person.append('birthplace')
    p_persons.append(person)

    page = (
    E.html(
        E.body(
        E.table(
                *[E.tr(
                     *[
                        E.td(split(col)) if ind == 1 and col is not None else
                        E.td(str(col)) for ind, col in enumerate(row)
                     ]
                     ) for row in p_persons ]
                , border="2"
                )
            )
        )
    )

    with open('result.html', 'w') as f:
        f.write(etree.tostring(page, pretty_print=True).decode('utf-8'))

def split(col):
    from lxml.builder import E
    import re

    muts = re.split('\d\.',col)
    links = []
    for idx, mut in enumerate(muts):
        print(mut)
        links.append(str(idx + 1))
        links.append(E.a(mut, href=mut))
        links.append('\n')
    return links

上述简单结构一切都很好,但有时我需要分析数据并根据内容将其输出到E.td

我构建了person元素,这是一个字段列表,而不是将它放到p_persons列表中,用于输出。第二个字段(包含由计数器分隔的URL的字符串)向我们展示了要输出的结构。有必要拆分此字符串,并在单个单元格E.td内以数字列表的形式显示网址。

但如果我放E.td

E.td(split(col))无法识别它
Traceback (most recent call last):
 File "<stdin>", line 11, in <module>
  File "/home/user/functions.py", line 298, in rows_to_html
) for row in rows ]
  File "/home/user/functions.py", line 298, in <listcomp>
) for row in rows ]
  File "/home/user/functions.py", line 296, in <listcomp>
  E.td(str(col)) for ind, col in enumerate(row)
  File "src/lxml/builder.py", line 222, in  lxml.builder.ElementMaker.__call__
TypeError: bad argument type: list(['1', <Element a at 0x7f1900117c48>, '\n'])

以下是我想要收到的HTML示例:

<!DOCTYPE html>
<html>
<body>
    <table border="2">
      <tr>
        <td>ID</td>
        <td><ol>
            <li>https://www.youtube.com/watch?v=qLsn5aNaVkI</li>
            <li>https://www.youtube.com/watch?v=MPbO6P3Vtx8</li>
            <li>https://www.youtube.com/watch?v=jVKWPaFuNng</li>
            <li>https://www.youtube.com/watch?v=9HFyB4gCOqY</li>
            <li>https://www.youtube.com/watch?v=muQGef4Df_8</li>
            </ol>  
        </td>
        <td>birthplace</td>
    </tr>
</table>
</body>
</html>

这样做的正确方法是什么?我应该将URL包装到DIV还是其他?我没有在网上找到类似的例子。

0 个答案:

没有答案