我在python中编写了一个脚本来从一些html元素中删除一些文本。我写的脚本可以解析它。然而,问题是数据被解析,它们之间有很大的空间。我尝试使用.strip()
方法,但它对结果没有任何影响。我该如何解决?
html元素:
html="""
<div class="organisation-details">
<div class="personnel shaded">
<h3>KEY PERSONNEL</h3>
<p>
Director: Andrew Bickerton<br>
Director: Andrew Connor<br>
Office Manager: Tom Marchant<br>
</p>
</div>
<div class="company-type shaded">
<h3>COMPANY TYPE</h3>
<p>
Importer
</p>
</div>
<div class="company-details shaded">
<h3>COMPANY DETAILS</h3>
<p>
Year Established: 1984 <br>
VAT No: GB 413 3611 93<br>
No of Employees: 1-20<br>
</p>
</div>
</div>
"""
这个脚本:
from lxml.html import fromstring
tree = fromstring(html)
for title in tree.cssselect(".organisation-details"):
key = title.cssselect("h3:contains('KEY PERSONNEL')+p")[0].text_content().strip()
details = title.cssselect("h3:contains('COMPANY DETAILS')+p")[0].text_content().strip()
ctype = title.cssselect("h3:contains('COMPANY TYPE')+p")[0].text_content().strip()
print(key,details,ctype)
我的输出:
Director: Andrew Bickerton
Director: Andrew Connor
Office Manager: Tom Marchant Year Established: 1984
VAT No: GB 413 3611 93
No of Employees: 1-20 Importer
我追求的结果(或更接近的结果):
Director: Andrew Bickerton
Director: Andrew Connor
Office Manager: Tom Marchant
Year Established: 1984
VAT No: GB 413 3611 93
No of Employees: 1-20
Importer
答案 0 :(得分:2)
问题是fatal error: 'try!' expression unexpectedly raised an error:
Swift.DecodingError.dataCorrupted(
Swift.DecodingError.Context(
codingPath: [
Test_App.MovieList.CodingKeys.movies,
Foundation.(_JSONKey in _12768CA107A31EF2DCE034FD75B541C9)(stringValue: "Index 0", intValue: Optional(0)),
Test_App.Movie.CodingKeys.dateUpdated
],
debugDescription: "Date string does not match format expected by formatter.",
underlyingError: nil)
)
,key
和details
在字符串中间包含多行和空格。您需要在换行符上拆分它们并删除每个项目。类似的东西:
ctype
并重复for piece in key.split('\n'):
print(piece.strip())
和details
。
答案 1 :(得分:0)
当浏览器向您显示该html时,它不会注意字符串开头和结尾的外部空格。 Python(或任何其他编程语言)从字面上理解字符串中的空格。巧合的是,昨天我在类似的情况下难过。