将HTML输出转换为String

时间:2017-01-09 13:30:23

标签: html string

我想将HTML输出(人们在浏览器上看到的内容)转换为String(实际上可能是一组字符串)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Title</title>
</head>
<body>
<div class="vk_c _cy obcontainer card-section">
    <div class="_frf"><select class="_nif _Ohf ik4llLYIcrKE-trqxvH0pmCk" jsaction="change:r.3r2U78ZXCIY"
                              data-rtid="ik4llLYIcrKE" jsl="$x 1;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ6WoIGzAA">
        <option value="Area">Area</option>
        <option value="Data Transfer Rate">Data Transfer Rate</option>
        <option value="Digital Storage">Digital Storage</option>
        <option value="Energy">Energy</option>
        <option value="Frequency">Frequency</option>
        <option value="Fuel Economy">Fuel Economy</option>
        <option value="Length">Length</option>
        <option value="Mass">Mass</option>
        <option value="Plane Angle">Plane Angle</option>
        <option value="Pressure">Pressure</option>
        <option value="Speed">Speed</option>
        <option value="Temperature">Temperature</option>
        <option value="Time">Time</option>
        <option selected="1" value="Volume">Volume</option>
    </select></div>
    <div class="_cif" id="_Aif"><input class="_eif ik4llLYIcrKE-TAgAjI3bJNo" value="1"
                                       jsaction="change:r.EQNHKrw0qdA;keyup:r.EQNHKrw0qdA;r.JnbD_-w_xe0"
                                       data-rtid="ik4llLYIcrKE" jsl="$x 2;"
                                       data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ5WoIHDAA"> <select
            class="_dif _Ohf ik4llLYIcrKE-y69YkR-bRoA" id="_Bif" jsaction="change:r.tDOMoafrm4E"
            data-rtid="ik4llLYIcrKE" jsl="$x 3;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ5moIHTAA">
        <option text="US liquid gallon">US liquid gallon</option>
        <option text="US liquid quart">US liquid quart</option>
        <option text="US liquid pint">US liquid pint</option>
        <option text="US legal cup">US legal cup</option>
        <option text="US fluid ounce">US fluid ounce</option>
        <option selected="1" text="US tablespoon">US tablespoon</option>
        <option text="US teaspoon">US teaspoon</option>
        <option text="Cubic metre">Cubic metre</option>
        <option text="Litre">Litre</option>
        <option text="Millilitre">Millilitre</option>
        <option text="Imperial gallon">Imperial gallon</option>
        <option text="Imperial quart">Imperial quart</option>
        <option text="Imperial pint">Imperial pint</option>
        <option text="Imperial cup">Imperial cup</option>
        <option text="Imperial fluid ounce">Imperial fluid ounce</option>
        <option text="Imperial tablespoon">Imperial tablespoon</option>
        <option text="Imperial teaspoon">Imperial teaspoon</option>
        <option text="Cubic foot">Cubic foot</option>
        <option text="Cubic inch">Cubic inch</option>
    </select></div>
    <div class="_oif">=</div>
    <div class="_cif" id="_Cif"><input class="_eif ik4llLYIcrKE-7Ob2ZtRDv2s" value="3"
                                       jsaction="change:r.Y8jfekOjBAk;keyup:r.Y8jfekOjBAk;r.GsMrmfckh-M"
                                       data-rtid="ik4llLYIcrKE" jsl="$x 4;"
                                       data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ52oIHjAA"> <select
            class="_dif _Ohf ik4llLYIcrKE-EXIkszwxM2g" jsaction="change:r.xd0JMVj7UXs" data-rtid="ik4llLYIcrKE"
            jsl="$x 5;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ6GoIHzAA">
        <option text="US liquid gallon">US liquid gallon</option>
        <option text="US liquid quart">US liquid quart</option>
        <option text="US liquid pint">US liquid pint</option>
        <option text="US legal cup">US legal cup</option>
        <option text="US fluid ounce">US fluid ounce</option>
        <option text="US tablespoon">US tablespoon</option>
        <option selected="1" text="US teaspoon">US teaspoon</option>
        <option text="Cubic metre">Cubic metre</option>
        <option text="Litre">Litre</option>
        <option text="Millilitre">Millilitre</option>
        <option text="Imperial gallon">Imperial gallon</option>
        <option text="Imperial quart">Imperial quart</option>
        <option text="Imperial pint">Imperial pint</option>
        <option text="Imperial cup">Imperial cup</option>
        <option text="Imperial fluid ounce">Imperial fluid ounce</option>
        <option text="Imperial tablespoon">Imperial tablespoon</option>
        <option text="Imperial teaspoon">Imperial teaspoon</option>
        <option text="Cubic foot">Cubic foot</option>
        <option text="Cubic inch">Cubic inch</option>
    </select></div>
</div>
</body>
</html>

尽管有如此多的代码,但是当您运行该片段时,您实际上会将文本视为“音量”“1”“US Tablespoon”“=”“3”“US Teaspoon”

我搜索的算法只能从这样的HTML代码返回这些文本,是否可能?

1 个答案:

答案 0 :(得分:0)

当然,如果您使用类似Python BeautifulSoup的内容,您只需将文档加载到其中并输出文本。

#!/usr/bin/env python3
from bs4 import BeautifulSoup


html_doc = """
    your HTML string
"""

soup = BeautifulSoup(html_doc, 'lxml')
print(soup.get_text())