我正在学习python。我正在尝试使用它来自动检查图书馆书籍的可用性。
我尝试用bs4,request和partition执行它。
这是我要解析的链接:[http://catalogue.nlb.gov.sg/cgi-bin/spydus.exe/FULL/EXPNOS/BIBENQ/1592917/156302298,2][1]
我查看了它的源代码,这里有一段代码:
<tr>
<td valign="top"><a href="/cgi-bin/spydus.exe/ENQ/EXPNOS/GENENQ/1564461?LOCX=BIPL">**Bishan Public Library**</a>
<br />
</td>
<td valign="top">
<book-location data-title="The opposite of everyone" data-branch="BIPL" data-usagelevel="001" data-coursecode="" data-language="English" data-materialtype="BOOK" data-callnumber="JAC" data-itemcategory="" data-itemstatus="" data-lastreturndate="20160322"
data-accession="B31189097E" data-defaultLoc="Adult Lending">Adult Lending</book-location>
</td>
<td valign="top"><a href="/cgi-bin/spydus.exe/ENQ/EXPNOS/BIBENQ/1564461?CGS=E*English">English</a>
<br /><a href="/cgi-bin/spydus.exe/WBT/EXPNOS/BIBENQ/1564461?CNO=JAC&CNO_TYPE=B">JAC</a>
<br />
</td>
<td valign="top">**Available**
<br />
</td>
</tr>
<tr>
<td valign="top"><a href="/cgi-bin/spydus.exe/ENQ/EXPNOS/GENENQ/1564461?LOCX=BMPL">**Bukit Merah Public Library**</a>
<br />
</td>
<td valign="top">
<book-location data-title="The opposite of everyone" data-branch="BMPL" data-usagelevel="001" data-coursecode="" data-language="English" data-materialtype="BOOK" data-callnumber="JAC" data-itemcategory="" data-itemstatus="" data-lastreturndate="20160405"
data-accession="B31189102C" data-defaultLoc="Adult Lending">Adult Lending</book-location>
</td>
<td valign="top"><a href="/cgi-bin/spydus.exe/ENQ/EXPNOS/BIBENQ/1564461?CGS=E*English">English</a>
<br /><a href="/cgi-bin/spydus.exe/WBT/EXPNOS/BIBENQ/1564461?CNO=JAC&CNO_TYPE=B">JAC</a>
<br />
</td>
<td valign="top">**Available**
<br />
</td>
</tr>
我试图解析的信息是该书可用的库。
这就是我的所作所为:
import requests, bs4
>>> res = requests.get('http://catalogue.nlb.gov.sg/cgi-bin/spydus.exe/FULL/EXPNOS/BIBENQ/1592917/156302298,2')
>>> string = bs4.BeautifulSoup(res.text)
然后我尝试将 string 变成一个字符串:
>>> str(string)
它打印出整个源代码并严重滞后我的IDLE!
在它停止滞后后,我这样做了:
>>> keyword = '<a href="/cgi-bin/spydus.exe/ENQ/EXPNOS/GENENQ/1564461?LOCX='
>>> string.partition('keyword') Traceback (most recent call last): File "<pyshell#8>", line 1, in <module>
string.partition('keyword') TypeError: 'NoneType' object is not callable
我不知道为什么会导致错误,我确实将字符串变成了字符串,对吗?
此外,我使用了该关键字,因为它位于“库分支”之前和“可用性”之后。所以我想即使它产生了很多其他冗余代码,我也能够在第一行看到这本书可用的库分支。
我确信我这样做的方式并不是最有效的方式,如果你能指出正确的方式,或者向我展示,我将非常感激!
对不起,这是一个很长的帖子,但我想尽可能详细地了解我的情况。谢谢你对我的承诺。
答案 0 :(得分:1)
不,你没有将string
变成Python字符串,因为你没有将str(string)
的结果分配给任何变量,因此它丢失了:
>>> type(string)
<class 'bs4.BeautifulSoup'>
>>> type(str(string))
<type 'str'>
>>> type(string)
<class 'bs4.BeautifulSoup'>
变量string
未更改。试试这个:
>>> string = str(string)
>>> type(string)
<type 'str'>
现在你有一个str
字符串。
在相关说明中,为什么不使用BeautifulSoup
从HTML中提取数据?这就是它的用途,以及它的优点。这是一种方法:
import requests
from bs4 import BeautifulSoup
html = requests.get('http://catalogue.nlb.gov.sg/cgi-bin/spydus.exe/FULL/EXPNOS/BIBENQ/1592917/156302298,2').text
soup = BeautifulSoup(html)
holdings = soup.find('table', class_='clsTab1').find_all('tr')
for holding in holdings:
cells = holding.find_all('td')
if cells:
library = cells[0].text
availability = cells[-1].text
print('{}: {}'.format(library, availability))
<强>输出强>
Ang Mo Kio Public Library: Available Bedok Public Library: Available Bishan Public Library: Available Bukit Merah Public Library: Available Central Public Library: Available Geylang East Public Library: Available Jurong Regional Library: Available Jurong West Public Library: Available library@orchard: Available Marine Parade Public Library: Onloan - Due: 13 May 2016 Queenstown Public Library: Onloan - Due: 29 May 2016 Tampines Regional Library: Available Toa Payoh Public Library: Available Woodlands Regional Library: Available