Question

我正在处理的代码是从包含2个字段，URL和标题的HTML页面中检索列表...

网址始终以/URL....开头我需要附加＆＃34; http://website.com＆＃34;从re.findall。

返回的每一个

到目前为止的代码是：

bsoup=bs(html)
tag=soup.find('div',{'class':'item'})
reg=re.compile('<a href="(.+?)" rel=".+?" title="(.+?)"')
links=re.findall(reg,str(tag))
*(append "http://website.com" to the href"(.+?)" field)*
return links

Answer 1

尝试：

return str(soup)

然后使用以下输出方法之一：

return tag.find_all('a')会在应用更改后获取文档。

return [str(i) for i in tag.find_all('a')]为您提供所有链接元素。

obj1 = B(phone='0101', city='Daejeon') obj2 = B(**a)为您提供转换为字符串的所有链接元素。

现在，当已经有XML解析器工作时，不要尝试使用正则表达式解析HTML。

Python将字符串添加到包含多个项目的匹配列表中

1 个答案: