修改html文件(查找并替换href网址并保存)

时间:2019-05-20 11:31:38

标签: python html replace href

EDIT1:

我在原始代码中发现了一个错误,这给了我typeError。答案就在这里:BeautifulSoup - modifying all links in a piece of HTML?。该代码现在可以正常工作。

我有一个html文件,我想为其他人更改某些href网址,然后再次将其另存为html文件。我的目标是,当我打开html文件并单击链接时,它将带我到一个内部文件夹,而不是Internet网址(原始网址)。

我的意思是,我想将<a href="http://www.somelink.com">转换为<a href="C:/myFolder/myFile.html">

我尝试使用bs4打开文件并使用替换功能,但我收到TypeError: 'NoneType' object is not callable

现在这是我的代码:


# Dict which relates the original links with my the ones to replace them

links_dict = { original_link1 : my_link1 , original_link2 : my_link2 } # and so on..

# Get a list of links to loop and find them into the html file

original_links = links_dict .keys() 

soup = BeautifulSoup(open(html_file), "html.parser",encoding="utf8")

# This part is where I am stuck, the theory is loop through 'original_links'
 and if any of those links is found, replace it with the one I have in 'links_dict'

for link in soup.find_all('a',href=True):
    if link['href'] in links_dict:
        link['href'] = link['href'].replace(link['href'],links_dict[link['href']]

with open("new_file.html", "w",encoding="utf8") as file:
    file.write(str(soup))

有什么想法吗?

1 个答案:

答案 0 :(得分:1)

一旦您需要处理一些汤,就应该查找“ a”元素,然后检查其“ href”属性,如果它们与您的字典中的属性匹配,请根据需要进行替换。

我会制作'original_link1'等正则表达式,以便您轻松进行匹配。

碰巧,我相信您的问题已经得到解答,请参阅BeautifulSoup - modifying all links in a piece of HTML?