尝试一个简单的脚本,该脚本将来自beautifulsoup的数据保存在PostgreSQL数据库中,并得到一个错误。
from urllib.request import urlopen
from bs4 import BeautifulSoup
url = 'https://stackoverflow.com/questions/1125968/how-do-i-force-git-pull-to-overwrite-local-files'
page = urlopen(url)
soup = BeautifulSoup(page, 'html.parser')
In [38]: type(soup)
Out[38]: bs4.BeautifulSoup
In [39]: type(soup.text)
Out[39]: str
con = psycopg2.connect("host='localhost' dbname='google_crawl' user='crawler' password='postgres'")
cur = con.cursor()
cur.execute("CREATE TABLE Products(Id INTEGER PRIMARY KEY, website VARCHAR(20), html_code VARCHAR)")
In [36]: cur.execute("INSERT INTO Products (Id, website, html_code) VALUES(%s, %s)", ('1', 'test.com
...: ', str(soup.text)))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-36-4e9b298d0ce2> in <module>
----> 1 cur.execute("INSERT INTO Products (Id, website, html_code) VALUES(%s, %s)", ('1', 'test.com', str(soup.text)))
TypeError: not all arguments converted during string formatting
错误表明它无法将数据之一转换为字符串,但是我不能特别说明为什么甚至将伪文本而不是soup
都返回相同的错误。