Question

我已经玩了很长时间了。我想从each_div变量返回的值中替换一串文本，该变量从网页返回一大堆已解析的值。

def scrape_page():
    create_dir(project_dir)
    page = 1
    max_page = 10
    while page < max_page:
        page = page + 1
        for each_div in soup.find_all('div',{'class':'username'}):
            f.write(str(each_div) + "\n")

如果我运行此代码，它将从html页面解析用户名类中的数据。问题是它返回它：

<div class="username">someone_s_username</div>

我一直在试图解决的问题是剥离<div class="username">和</div>部分，因此它只返回实际的用户名而不是html。如果有人知道如何做到这一点，那就太棒了，谢谢你

Answer 1

当然，您可以使用Python的替换方法：

for each_div in soup.find_all('div',{'class':'username'}):
    each_div = each_div.replace('''<div class="username">''',"")
    each_div = each_div.replace("</div>","")
    f.write(str(each_div) + "\n")

或者，您可以拆分字符串以获取所需的部分：

for each_div in soup.find_all('div',{'class':'username'}):
    each_div = each_div.split(">")[1]  # everything after the first ">"
    each_div = each_div.split("<")[0]  # everything before the other "<"
    f.write(str(each_div) + "\n")

哦，我记得，我相信你能够做到这一点：

for each_div in soup.find_all('div',{'class':'username'}):
    f.write(str(each_div.text) + "\n")

Python如何从列表中的字符串中删除字符

1 个答案: