我想将HTML输出(人们在浏览器上看到的内容)转换为String(实际上可能是一组字符串)
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Title</title>
</head>
<body>
<div class="vk_c _cy obcontainer card-section">
<div class="_frf"><select class="_nif _Ohf ik4llLYIcrKE-trqxvH0pmCk" jsaction="change:r.3r2U78ZXCIY"
data-rtid="ik4llLYIcrKE" jsl="$x 1;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ6WoIGzAA">
<option value="Area">Area</option>
<option value="Data Transfer Rate">Data Transfer Rate</option>
<option value="Digital Storage">Digital Storage</option>
<option value="Energy">Energy</option>
<option value="Frequency">Frequency</option>
<option value="Fuel Economy">Fuel Economy</option>
<option value="Length">Length</option>
<option value="Mass">Mass</option>
<option value="Plane Angle">Plane Angle</option>
<option value="Pressure">Pressure</option>
<option value="Speed">Speed</option>
<option value="Temperature">Temperature</option>
<option value="Time">Time</option>
<option selected="1" value="Volume">Volume</option>
</select></div>
<div class="_cif" id="_Aif"><input class="_eif ik4llLYIcrKE-TAgAjI3bJNo" value="1"
jsaction="change:r.EQNHKrw0qdA;keyup:r.EQNHKrw0qdA;r.JnbD_-w_xe0"
data-rtid="ik4llLYIcrKE" jsl="$x 2;"
data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ5WoIHDAA"> <select
class="_dif _Ohf ik4llLYIcrKE-y69YkR-bRoA" id="_Bif" jsaction="change:r.tDOMoafrm4E"
data-rtid="ik4llLYIcrKE" jsl="$x 3;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ5moIHTAA">
<option text="US liquid gallon">US liquid gallon</option>
<option text="US liquid quart">US liquid quart</option>
<option text="US liquid pint">US liquid pint</option>
<option text="US legal cup">US legal cup</option>
<option text="US fluid ounce">US fluid ounce</option>
<option selected="1" text="US tablespoon">US tablespoon</option>
<option text="US teaspoon">US teaspoon</option>
<option text="Cubic metre">Cubic metre</option>
<option text="Litre">Litre</option>
<option text="Millilitre">Millilitre</option>
<option text="Imperial gallon">Imperial gallon</option>
<option text="Imperial quart">Imperial quart</option>
<option text="Imperial pint">Imperial pint</option>
<option text="Imperial cup">Imperial cup</option>
<option text="Imperial fluid ounce">Imperial fluid ounce</option>
<option text="Imperial tablespoon">Imperial tablespoon</option>
<option text="Imperial teaspoon">Imperial teaspoon</option>
<option text="Cubic foot">Cubic foot</option>
<option text="Cubic inch">Cubic inch</option>
</select></div>
<div class="_oif">=</div>
<div class="_cif" id="_Cif"><input class="_eif ik4llLYIcrKE-7Ob2ZtRDv2s" value="3"
jsaction="change:r.Y8jfekOjBAk;keyup:r.Y8jfekOjBAk;r.GsMrmfckh-M"
data-rtid="ik4llLYIcrKE" jsl="$x 4;"
data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ52oIHjAA"> <select
class="_dif _Ohf ik4llLYIcrKE-EXIkszwxM2g" jsaction="change:r.xd0JMVj7UXs" data-rtid="ik4llLYIcrKE"
jsl="$x 5;" data-ved="0ahUKEwie8vbnkLXRAhWDbxQKHa5dBxUQ6GoIHzAA">
<option text="US liquid gallon">US liquid gallon</option>
<option text="US liquid quart">US liquid quart</option>
<option text="US liquid pint">US liquid pint</option>
<option text="US legal cup">US legal cup</option>
<option text="US fluid ounce">US fluid ounce</option>
<option text="US tablespoon">US tablespoon</option>
<option selected="1" text="US teaspoon">US teaspoon</option>
<option text="Cubic metre">Cubic metre</option>
<option text="Litre">Litre</option>
<option text="Millilitre">Millilitre</option>
<option text="Imperial gallon">Imperial gallon</option>
<option text="Imperial quart">Imperial quart</option>
<option text="Imperial pint">Imperial pint</option>
<option text="Imperial cup">Imperial cup</option>
<option text="Imperial fluid ounce">Imperial fluid ounce</option>
<option text="Imperial tablespoon">Imperial tablespoon</option>
<option text="Imperial teaspoon">Imperial teaspoon</option>
<option text="Cubic foot">Cubic foot</option>
<option text="Cubic inch">Cubic inch</option>
</select></div>
</div>
</body>
</html>
尽管有如此多的代码,但是当您运行该片段时,您实际上会将文本视为“音量”“1”“US Tablespoon”“=”“3”“US Teaspoon”
我搜索的算法只能从这样的HTML代码返回这些文本,是否可能?
答案 0 :(得分:0)
当然,如果您使用类似Python BeautifulSoup的内容,您只需将文档加载到其中并输出文本。
#!/usr/bin/env python3
from bs4 import BeautifulSoup
html_doc = """
your HTML string
"""
soup = BeautifulSoup(html_doc, 'lxml')
print(soup.get_text())