Question

我使用BeautifulSoup解析HTML页面以查找和提取指定的项目。

据我所知，问题是BeautifulSoup和Python解析器之间的冲突导致了这个问题。我正在寻找HTML中的特定文本，该文本引导我并锚定标记以进行提取。我不喜欢＆＃39;似乎能够解决问题。这是我的代码：

with requests.Session() as s:
  r = s.get('https://www.rbkc.gov.uk/planning/searches/details.aspx?batch=20&id=PP/11/04187&type=&tab=#tabs-planning-6')
  c = s.cookies.get_dict()
soup = BeautifulSoup(r.text, 'lxml')
table = soup.find('table', {'id': 'casefiledocs'})

vals = []
rows = table.findAll('tr')
for tr in rows:
  cols = tr.findAll('td')
  for td in cols:
    if td.get_text().encode('utf-8') == 'Application Form':
      print td

有解决方案吗？欣赏它。

Answer 1

剥掉空白：

if td.get_text().strip() == 'Application Form':
    ...

解析包含不间断空格字符的unicoded文件

1 个答案: