我想使用BeautifulSoup从html文件中提取所有表格,如下所示,并将其写入csv。
HTML如下所示:
<h4>Site Name : Aria</h4>
<table style="width: 100%">
<tbody><tr>
<th style="width: 25%"><strong>Dn Name:</strong></th>
<td style="width: 25%"><strong>Aria</strong></td>
<th style="width: 25%"><strong>WL:</strong></th>
<td style="width: 25%"><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>River Name:</strong></th>
<td><strong>Ben</strong></td>
<th><strong>DL:</strong></th>
<td><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>Basin Name:</strong></th>
<td><strong>GAN<strong></strong></strong></td>
<th><strong>HFL:</strong></th>
<td><strong>49.4 Meters (m)<strong></strong></strong></td>
</tr>
<tr>
<th><strong>Div Name:</strong></th>
<td><a target="_blank" href="http://imd.gov.in/ onclick="window.open(this.href, this.target, 'width=1000, height=600, toolbar=no, resizable=no'); return false;">LGD-I</a></td>
<th><strong>HFL date:</strong></th>
<td>14-08-2017</td>
</tr>
</tbody></table>
<p> </p>
<table>
<tbody><tr>
<th colspan="3" style="text-align: center;"><strong>PRESENT WL</strong></th>
</tr>
<tr>
<td class="" style="width:33%; height:18px;">Date: 17-07-2018 12:00</td>
<td class="" style="width:33%;">Value: 45.43 Meters (m)</td>
<td class="" style="width:33%;">Trend: Steady</td>
</tr>
<tr>
<th colspan="3" style="text-align: center;"><strong>CUMULATIVE DAILY RF</strong></th>
</tr>
<tr>
<td style="width:33%; height:18px;">Date: 17-07-2018 08:30</td>
<td style="width:33%;">Value: 0.0 Milimiters (mm)</td>
<td style="width:33%;"></td>
</tr>
</tbody></table>
<p> </p>
<table style="width: 100%">
<tbody><tr>
<th colspan="4" style="text-align: center;"><strong>NO FORECAST</strong></th>
</tr>
</tbody></table>
</div>
我尝试从这三个表中提取文本,但无法以所需的格式写出文本
我的代码
now = datetime.datetime.now()
date = now.strftime("%d-%m-%Y")
os.chdir(r'D:\shared')
soup = BeautifulSoup(response.text,"html5lib")
tables = soup.find_all("tr")
test =[]
for table in tables:
test.append(table.get_text())
filename = 'Water'+'-'+str(date)+'.csv'
out = open(filename, mode='ab')
writer = csv.writer(out)
writer.writerow(data)
out.close()
在csv输出中,第一个表写入第一列,第二个表写入第二个表,第三个表写入第三列。
我想要以下格式的数据:
Site Name: Aria
Dn Name: Aria
WL: Meters (m)
River Name: Ben
DL: Meters (m)
Basin Name: GAN
HFL: 49.4 Meters (m)
Div Name: LGD-I)
HFL date: 14-08-2017
PRESENT WL
Date: 17-07-2018 12:00
Value: 45.43 Meters (m)
Trend: Steady
CUMULATIVE
DAILY RF
Date: 17-07-2018 08:30
Value: 0.0 Milimiters (mm)
NO FORECAST
答案 0 :(得分:2)
我对这个问题的尝试:
@if(session()->has('foo'))
@if( ! empty( session('foo')->bar ) )
{{ session('foo')->bar }}
@else
empty
@endif
@endif
打印:
data = """
<h4>Site Name : Aria</h4>
<table style="width: 100%">
<tbody><tr>
<th style="width: 25%"><strong>Dn Name:</strong></th>
<td style="width: 25%"><strong>Aria</strong></td>
<th style="width: 25%"><strong>WL:</strong></th>
<td style="width: 25%"><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>River Name:</strong></th>
<td><strong>Ben</strong></td>
<th><strong>DL:</strong></th>
<td><strong> Meters (m)</strong></td>
</tr>
<tr>
<th><strong>Basin Name:</strong></th>
<td><strong>GAN<strong></strong></strong></td>
<th><strong>HFL:</strong></th>
<td><strong>49.4 Meters (m)<strong></strong></strong></td>
</tr>
<tr>
<th><strong>Div Name:</strong></th>
<td><a target="_blank" href="http://imd.gov.in/ onclick="window.open(this.href, this.target, 'width=1000, height=600, toolbar=no, resizable=no'); return false;">LGD-I</a></td>
<th><strong>HFL date:</strong></th>
<td>14-08-2017</td>
</tr>
</tbody></table>
<p> </p>
<table>
<tbody><tr>
<th colspan="3" style="text-align: center;"><strong>PRESENT WL</strong></th>
</tr>
<tr>
<td class="" style="width:33%; height:18px;">Date: 17-07-2018 12:00</td>
<td class="" style="width:33%;">Value: 45.43 Meters (m)</td>
<td class="" style="width:33%;">Trend: Steady</td>
</tr>
<tr>
<th colspan="3" style="text-align: center;"><strong>CUMULATIVE DAILY RF</strong></th>
</tr>
<tr>
<td style="width:33%; height:18px;">Date: 17-07-2018 08:30</td>
<td style="width:33%;">Value: 0.0 Milimiters (mm)</td>
<td style="width:33%;"></td>
</tr>
</tbody></table>
<p> </p>
<table style="width: 100%">
<tbody><tr>
<th colspan="4" style="text-align: center;"><strong>NO FORECAST</strong></th>
</tr>
</tbody></table>
</div>"""
import os
import datetime
from bs4 import BeautifulSoup
from pprint import pprint
# For Python 2.7 the next line should be "from itertools import izip_longest"
from itertools import zip_longest
import csv
now = datetime.datetime.now()
date = now.strftime("%d-%m-%Y")
# os.chdir(r'D:\shared')
soup = BeautifulSoup(data, "lxml")
tables = []
for table in soup.find_all('table'):
current_table = []
tables.append(current_table)
for row in table.find_all("tr"):
for (th, td) in zip_longest(row.find_all('th'), row.find_all('td')):
s = ("%s %s" % (th.text.strip() if th else '', td.text.strip() if td else '')).strip()
if s:
current_table.append(s)
tables[0].insert(0, ': '.join(w.strip() for w in soup.find('h4').text.split(':')))
for table in tables:
for i in table:
print(i)
filename = 'CWC-Water'+'-'+str(date)+'.csv'
out = open(filename, mode='w')
writer = csv.writer(out)
for table in zip_longest(*tables):
writer.writerow(table)
out.close()
并以以下格式输出.csv文件(表中的3列,LibreOffice的屏幕截图):
编辑: -正确的图片