我正在创建数据库并插入数据。我们的后端工程师说,他需要专栏以HTML格式保存整篇文章。但是当我插入数据时,它给了我这样的错误:
我检查了错误的确切来源,我发现:
看起来这部分有一些引号或标点问题,并且同一行多次出现。而且我使用str()
函数将格式化的HTML文本(使用type()
查看数据类型为bs4.element.Tag
)转换为字符串,但是问题仍然存在。
我的数据库描述是:
('id', 'mediumint(9)', 'NO', 'PRI', None, 'auto_increment')
('weburl', 'varchar(200)', 'YES', '', None, '')
('picurl', 'varchar(200)', 'YES', '', None, '')
('headline', 'varchar(200)', 'YES', '', None, '')
('abstract', 'varchar(200)', 'YES', '', None, '')
('body', 'longtext', 'YES', '', None, '')
('formed', 'longtext', 'YES', '', None, '')
('term', 'varchar(50)', 'YES', '', None, '')
我用来收集全文的函数是:
def GetBody(url,plain=False):
# Fetch the html file
response = urllib.request.urlopen(url)
html_doc = response.read()
# Parse the html file
soup = BeautifulSoup(html_doc, 'html.parser')
#find the article body
body = soup.find("section", {"name":"articleBody"})
if not plain:
return body
else:
text = ""
for p_tag in body.find_all('p'):
text = ' '.join([text,p_tag.text])
return text
然后通过此函数导入数据:
def InsertDatabase(section):
s = TopStoriesSearch(section)
count1 = 0
formed = []
while count1 < len(s):
# tr = GetBody(s[count1]['url'])
# formed.append(str(tr))
# count1 = count1 + 1
(I use this to convert HTML to string, or use the code below)
formed.append(GetBody(s[count1]['url']))
count1 = count1 + 1
这是我的插入函数:
for each in overall(I save everything in this list named overall):
cur.execute('insert into topstories(formed) values("%s")' % (each["formed"]))
有解决问题的技巧吗?
答案 0 :(得分:0)
execute()
函数的语法如下(link):
cursor.execute(operation, params=None, multi=False)
因此,您可以提供要在查询中使用的值作为execute()
函数的参数。在这种情况下,它将自动处理值,从而消除了您的问题:
import mysql.connector
cnx = mysql.connector.connect(...)
cur = cnx.cursor()
...
for each in overall:
# If 'each' is a dictionary containing 'formed' as key,
# i.e. each = {..., 'formed': ..., ...}, you can do as follows
cur.execute('INSERT INTO topstories(formed) VALUES (%s)', (each['formed']))
# You can also use dictionary directly if you use named placeholder in the query
cur.execute('INSERT INTO topstories(formed) VALUES (%(formed)s)', each)
...
cnx.commit()
cnx.close()