我终于越来越接近从特定网站中提取表格,但我的问题是我似乎无法弄清楚如何
html代码如下
<table border="1" cellpadding="5" cellspacing="0">
<tr class="bg">
<td><strong>Reference</strong></td>
<td stytle="width:100px"><strong>Description</strong></td>
<td><strong>Download Documents</strong></td>
<td stytle="width:50px"><strong>Closing Date</strong></td>
<td stytle="width:50px"><strong>Contact Details</strong></td>
<td><strong>Briefing</strong></td>
<!--<td><strong>PUBLISHED</strong></td>-->
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me<br />
you</td>
<td>next week</td>
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me<br />
you</td>
<td>next week</td>
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me<br />
you</td><td>next week</td>
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me<br />
you</td><td>next week</td>
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me</td>
<td>next week</td>
</tr>
<tr>
<td>123456</td>
<td>text 123</td>
<td><a href="/downloads/linktofile.zip" target="_blank">Documents click here </a></td>
<td>2 weeks</td>
<td>me</td>
<td>next week</td>
</tr>
</table>
我想要删除联系人详细信息中的 br ,并显示完整链接而不是“文档点击此处”。
请注意,这是一个示例表 - 从原始项目重建。
我的python代码运行正常,只是它将新的链接后的内容添加到整个output.csv中。
import csv
import requests
import os
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
from bs4 import Tag
testwebsite = 'https://example.com'
uClient = uReq(testwebsite)
page_html = uClient.read()
uClient.close()
page_soup = soup(page_html, "html.parser")
testwebsitetendersaved=""
#Table is very ugly formated in a span tag and tables within tables
testwebsite_container = page_soup.find("span", id="MainContent2_ctl00_lblContent").findAll("table")[1]
for record in testwebsite_container.findAll('tr'):
testwebsitetender=""
for data in record.findAll('td'):
testwebsitetender=testwebsitetender+","+data.text
testwebsitetendersaved = testwebsitetendersaved + "\n" + testwebsitetender[1:]
header="Tender Number, Description, Documents Link, Closing Date, Contact Details, Briefing"+"\n"
file = open(os.path.expanduser("output.csv"), "wb")
file.write(bytes(header, encoding="ascii",errors='ignore'))
file.write(bytes(testwebsitetendersaved, encoding="ascii",errors='ignore'))
print(testwebsitetendersaved)
答案 0 :(得分:0)
我希望这就是你想要的。
testwebsitetendersaved=""
#Table is very ugly formated in a span tag and tables within tables
testwebsite_container = page_soup.find("span", id="MainContent2_ctl00_lblContent").findAll("table")[1]
header="Tender Number, Description, Documents Link, Closing Date, Contact Details, Briefing"+"\n"
file = open(os.path.expanduser("output.csv"), "wb")
file.write(bytes(header, encoding="ascii",errors='ignore'))
skiptrcnt=1 # skip first tr block
for i,record in enumerate(testwebsite_container.findAll('tr')):
if skiptrcnt>i:
continue
testwebsitetender=""
tnum = record('td')[0].text
desc = record('td')[1].text
doclink = record('td')[2].text
alink = record('td')[2].find("a")
if alink :
doclinkurl=testwebsite+alink['href']
closingdate = record('td')[3].text
detail = record('td')[4].text
detail = detail.replace('\n', '')
brief = record('td')[5].text
brief = brief.replace('\n', '')
print(tnum, desc, doclink, doclinkurl, closingdate, detail, brief)
testwebsitetendersaved="{},{},{},{},{},{}\n".format(tnum, desc, doclink, doclinkurl, closingdate, detail, brief)
file.write(bytes(testwebsitetendersaved, encoding="ascii",errors='ignore'))
file.close()
我的输出是
123456 text 123 Documents click here https://example.com/downloads/linktofile.zip 2 weeks me you next week
123456 text 123 Documents click here https://example.com/downloads/linktofile.zip 2 weeks me you next week
123456 text 123 Documents click here https://example.com/downloads/downloads/linktofile.zip 2 weeks me you next week
123456 text 123 Documents click here https://example.com/downloads/linktofile.zip 2 weeks me you next week
123456 text 123 Documents click here https://example.com/downloads/downloads/linktofile.zip 2 weeks me next week
123456 text 123 Documents click here https://example.com/downloads/linktofile.zip 2 weeks me next week