如何将HTML格式的文本插入MySQL?

时间:2019-06-22 02:34:08

标签: mysql python-3.x mysql-python

我正在创建数据库并插入数据。我们的后端工程师说,他需要专栏以HTML格式保存整篇文章。但是当我插入数据时,它给了我这样的错误:

enter image description here

我检查了错误的确切来源,我发现:

enter image description here

看起来这部分有一些引号或标点问题,并且同一行多次出现。而且我使用str()函数将格式化的HTML文本(使用type()查看数据类型为bs4.element.Tag)转换为字符串,但是问题仍然存在。

我的数据库描述是:

('id', 'mediumint(9)', 'NO', 'PRI', None, 'auto_increment')
('weburl', 'varchar(200)', 'YES', '', None, '')
('picurl', 'varchar(200)', 'YES', '', None, '')
('headline', 'varchar(200)', 'YES', '', None, '')
('abstract', 'varchar(200)', 'YES', '', None, '')
('body', 'longtext', 'YES', '', None, '')
('formed', 'longtext', 'YES', '', None, '')
('term', 'varchar(50)', 'YES', '', None, '')

我用来收集全文的函数是:

def GetBody(url,plain=False):
    # Fetch the html file
    response = urllib.request.urlopen(url)
    html_doc = response.read()

    # Parse the html file
    soup = BeautifulSoup(html_doc, 'html.parser')

    #find the article body
    body = soup.find("section", {"name":"articleBody"})

    if not plain:
        return body
    else:
        text = ""
        for p_tag in body.find_all('p'):
            text = ' '.join([text,p_tag.text])
        return text

然后通过此函数导入数据:

 def InsertDatabase(section):
        s = TopStoriesSearch(section)
            count1 = 0
        formed = []
        while count1 < len(s):
    #         tr = GetBody(s[count1]['url'])
    #         formed.append(str(tr))
    #         count1 = count1 + 1
(I use this to convert HTML to string, or use the code below)
              formed.append(GetBody(s[count1]['url']))
              count1 = count1 + 1

这是我的插入函数:

for each in overall(I save everything in this list named overall):
          cur.execute('insert into topstories(formed) values("%s")' % (each["formed"]))

有解决问题的技巧吗?

1 个答案:

答案 0 :(得分:0)

execute()函数的语法如下(link):

cursor.execute(operation, params=None, multi=False)

因此,您可以提供要在查询中使用的值作为execute()函数的参数。在这种情况下,它将自动处理值,从而消除了您的问题:

import mysql.connector

cnx = mysql.connector.connect(...)
cur = cnx.cursor()
...
for each in overall:
    # If 'each' is a dictionary containing 'formed' as key,
    # i.e. each = {..., 'formed': ..., ...}, you can do as follows
    cur.execute('INSERT INTO topstories(formed) VALUES (%s)', (each['formed']))
    # You can also use dictionary directly if you use named placeholder in the query
    cur.execute('INSERT INTO topstories(formed) VALUES (%(formed)s)', each)
...
cnx.commit()
cnx.close()