python:更改HTML文件中的数据超链接

时间:2018-02-21 22:22:02

标签: python html hyperlink beautifulsoup

我们的网站上有一个链接指向zip文件夹。因此,HTML文件中的行显示如下: <p><a href="Data/WillCounty_AddressPoint.zip">Address Points</a> (updated weekly)</p>

将很快使用当前日期更改zip文件夹的名称,使其如下所示: WillCounty_AddressPoint_02212018.zip

如何更改HTML中的相应行?

使用this回答我有一个脚本。它运行时没有错误,但不会更改HTML文件中的任何内容。

import bs4
from bs4 import BeautifulSoup
import re
import time

data = r'\\gisfile\GISstaff\Jared\data.html' #html file location
current_time = time.strftime("_%m%d%Y") #date

#load the file
with open(data) as inf:
    txt = inf.read()
    soup = bs4.BeautifulSoup(txt)

#create new link
new_link = soup.new_tag('link', href="Data/WillCounty_AddressPoint_%m%d%Y.zip")
#insert it into the document
soup.head.append(new_link)

#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

1 个答案:

答案 0 :(得分:0)

这是使用BeautifulSoup替换href属性的方法。

from bs4 import BeautifulSoup
import time
data = r'data.html' #html file location
#load the file
current_time = time.strftime("_%m%d%Y")
with open(data) as inf:
     txt = inf.read()
soup = BeautifulSoup(txt, 'html.parser')
a = soup.find('a')
a['href'] = ("WillCounty_AddressPoint%s.zip" % current_time)
print (soup)

#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

输出:

<p><a href="WillCounty_AddressPoint_02212018.zip">Address Points</a> (updated weekly)</p>

并写入文件

更新以使用提供的文件中的数据。

from bs4 import BeautifulSoup
import time
data = r'data.html' #html file location
#load the file
current_time = time.strftime("_%m%d%Y")
with open(data) as inf:
     txt = inf.read()
soup = BeautifulSoup(txt, 'html.parser')
# Find the a element you want to change by finding it's text and selecting parent.
a = soup.find(text="Address Points").parent
a['href'] = ("WillCounty_AddressPoint%s.zip" % current_time)
print (soup)
#save the file again
with open (data, "w") as outf:
    outf.write(str(soup))

然而,它会删除空白行,否则就会保留您的HTML代码。

使用diff工具查看原始文件和修改文件的差异:

diff data\ \(copy\).html data.html 
77c77
< <p><a href="Data/WillCounty_AddressPoint.zip">Address Points</a> (updated weekly)</p>
---
> <p><a href="WillCounty_AddressPoint_02222018.zip">Address Points</a> (updated weekly)</p>
116,120d115
< 
< 
< 
< 
< 
154d148
<