Question

函数不会抛出任何错误，但是执行后字符串保持不变。 replace_with似乎什么也没做。所以我检查了var的类型，这是问题所在：

<class 'str'> <class 'bs4.element.Tag'>

fixed_text是str类型，而blog_text是tag类型。我不知道如何解决这个问题。

    def replace_urls(self):
        find_string_1 = '/blog/'
        find_string_2 = '/contakt/'
        replace_string_1 = 'blog.html'
        replace_string_2 = 'contact.html'

        exclude_dirs = ['media', 'static']

        for (root_path, dirs, files) in os.walk(f'{settings.BASE_DIR}/static/'):
            dirs[:] = [d for d in dirs if d not in exclude_dirs]
            for file in files:
                get_file = os.path.join(root_path, file)
                f = open(get_file, mode='r', encoding='utf-8')
                soup = BeautifulSoup(f, "lxml", from_encoding="utf-8")
                blog_text = soup.find('a', attrs={'href':find_string_1})
                contact_text = soup.find('a', attrs={'href':find_string_2})
                fixed_text = str(blog_text).replace(find_string_1, replace_string_1)
                fixed_text_2 = str(contact_text).replace(find_string_2, replace_string_2)
                blog_text.replace_with(fixed_text)
                contact_text.replace_with(fixed_text_2)

Answer 1

您的解决方案似乎运行良好。但是，从我看来，您尝试做的是将整个href替换为另一个blog_text.attrs['href'] = replace_string_1。最简单的方法是：

soup

这将更改元素 inside str(soup)，因此最后可以执行以下操作：

str(blog_text).replace

，然后查看您的更改。通过执行find_string_1 = '/blog/' replace_string_1 = 'blog.html' from bs4 import BeautifulSoup soup = BeautifulSoup('<a href="/blog/">the text</a>', "lxml") blog_text = soup.find('a', attrs={'href':find_string_1}) blog_text.attrs['href'] = replace_string_1 print(str(soup))，您正在处理从汤中分离出来的字符串。

最小示例：

 '<html><body><a href="blog.html">the text</a></body></html>'

结果：

with open(some_file_name, 'wb') as f_out:
    f_out.write(soup.prettify('utf-8'))

编辑：将更改写回到文件中

{{1}}

BeautifulSoup不会替换字符串

1 个答案: