这是我的网页抓取代码,用于获取内容并导出到 csv 文件。我可以知道为什么 csv 文件中的每一行都有间距吗?能解决吗?谢谢!
Python 代码
import requests
from bs4 import BeautifulSoup
import csv
session = requests.session()
payload = {"i0023":"XXXXXX",
"i0025":"XXXXXX"
}
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)
s = session.get("http://192.168.XXX.XXX/m_departmentid.html")
soup = BeautifulSoup(s.text, "html.parser")
table = soup.find('div', attrs={ "class" : "ItemListComponent"})
tbody = table.find_all('tbody')
rows = []
for row in table.find_all('tr'):
rows.append([val.text for val in row.find_all('td')[0:6]])
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(row for row in rows if row)
源代码
<div class="ItemListComponent">
<table>
<thead>
<tr><th rowspan="3" scope="col">Department ID</th><th colspan="5" scope="col">Page Total/Page Restriction</th><th rowspan="3" scope="col"></th></tr>
<tr><th colspan="3" scope="col">Total Prints</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th></tr>
<tr><th colspan="1" scope="col">Total</th><th colspan="1" scope="col">Color</th><th colspan="1" scope="col">Black & White</th><th colspan="1" scope="col">Print</th><th colspan="1" scope="col">Print</th></tr>
</thead>
<tbody>
<tr><td>7654321</td><td>11</td><td>0</td><td>11</td><td>0</td><td>11</td><td></td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=100">0000100</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(100)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(100)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=101">0000101</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(101)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(101)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=102">0000102</a></td><td>18</td><td>5</td><td>13</td><td>5</td><td>13</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(102)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(102)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=103">0000103</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(103)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(103)" />
</td></tr>
<tr><td><a href="/m_departmentid_edit.html?id=104">0000104</a></td><td>0</td><td>0</td><td>0</td><td>0</td><td>0</td><td><input class="ButtonEnable" type="button" value="Delete" title="Delete" onclick="departmentIdDelete(104)"/><input class="ButtonEnable" type="button" value="Clear Count" onclick="departmentIdClear(104)" />
</td></tr>
答案 0 :(得分:1)
您将其打开为“wb”,即写入字节。改为将其打开为“w”。
答案 1 :(得分:0)
您需要对字符串进行编码以将其转换为字节对象。
for row in soup.select(".ItemListComponent tbody tr")[1:215]:
row_text = [x.text.encode() for x in row.find_all("td")]
print(",".join(row_text))
答案 2 :(得分:0)
谢谢大家。最后,我找到了解决在 csv writer 中添加换行参数缺失的问题的解决方案。
代码
session = requests.session()
payload = {"i0023":"XXXXX",
"i0025":"XXXXX"
}
session.post("http://192.168.XXX.XXX/checkLogin.cgi",data = payload)
s = session.get("http://192.168.XXX.XXX/m_departmentid.html")
soup = BeautifulSoup(s.text, "html.parser")
table = soup.find('div', attrs={ "class" : "ItemListComponent"})
table_tbody = table.find('tbody')
rows = []
for row in table.find_all('tr'):
rows.append([val.text for val in row.find_all('td')])
with open(("\test.csv"), 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(row for row in rows if row)