我已经玩了很长时间了。我想从each_div
变量返回的值中替换一串文本,该变量从网页返回一大堆已解析的值。
def scrape_page():
create_dir(project_dir)
page = 1
max_page = 10
while page < max_page:
page = page + 1
for each_div in soup.find_all('div',{'class':'username'}):
f.write(str(each_div) + "\n")
如果我运行此代码,它将从html页面解析用户名类中的数据。问题是它返回它:
<div class="username">someone_s_username</div>
我一直在试图解决的问题是剥离<div class="username">
和</div>
部分,因此它只返回实际的用户名而不是html。如果有人知道如何做到这一点,那就太棒了,谢谢你
答案 0 :(得分:1)
当然,您可以使用Python的替换方法:
for each_div in soup.find_all('div',{'class':'username'}):
each_div = each_div.replace('''<div class="username">''',"")
each_div = each_div.replace("</div>","")
f.write(str(each_div) + "\n")
或者,您可以拆分字符串以获取所需的部分:
for each_div in soup.find_all('div',{'class':'username'}):
each_div = each_div.split(">")[1] # everything after the first ">"
each_div = each_div.split("<")[0] # everything before the other "<"
f.write(str(each_div) + "\n")
哦,我记得,我相信你能够做到这一点:
for each_div in soup.find_all('div',{'class':'username'}):
f.write(str(each_div.text) + "\n")