需要有关将数据从python脚本插入MySQL数据库的指导

时间:2015-11-20 10:54:44

标签: python mysql web-crawler

我抓取了一个网页,以抓取某些信息,如价格,标题等。

现在我的目标是将信息插入数据库。我已经设置了数据库,其中包含所需的相应字段。

那是我的代码:

def trade_spider(max_pages):
Language = "Japanese"
partner = La
location = Tokyo
already_printed = set()
for reg in Region:
    count = 0
    count1 = 0
    page = -1
    while page <= max_pages:
        page += 1
        response = urllib.request.urlopen("http://www.jsox.de/s/search.json?q=" + str(reg) +"&page=" + str(page))
        jsondata = json.loads(response.read().decode("utf-8"))
        format = (jsondata['activities'])
        g_data = format.strip("'<>()[]\"` ").replace('\'', '\"')
        soup = BeautifulSoup(g_data)

        articles = soup.find_all("article", {"class": "activity-card activity-card-horizontal "})

        try:
            connection = mysql.connector.connect\
                (host = "localhost", user = "root", passwd ="", db = "crawl")
        except:
            print("No connection to Server")
            sys.exit(0)

        cursor = connection.cursor()

        cursor.execute("DELETE from prices_crawled where Location=" + str(location) + " and Partner=" + str(partner))
        connection.commit()

        for article in articles:
            headers = article.find_all("h3", {"class": "activity"})
            for header in headers:
                header_initial = header.text.strip()
                if header_initial not in already_printed:
                    already_printed.add(header_initial)
                    header_final = header_initial


            prices = article.find_all("span", {"class": "price"})
            for price in prices:
                price_end = price.text.strip().replace(",","")[2:]
                count1 += 1
                if count1 > count:
                    pass
                else:
                    price_final = price_end


            deeplinks = article.find_all("a", {"class": "activity-card"})
            for t in set(t.get("href") for t in deeplinks):
                deeplink_initial = t
                if deeplink_initial not in already_printed:
                    already_printed.add(deeplink_initial)
                    deeplink_final = deeplink_initial

                    cursor.execute('''INSERT INTO prices_crawled (price_id, Header, Price, Deeplink, Partner, Location, Language) \
                            VALUES(%s, %s, %s, %s, %s, %s, %s)''', ['None'] + [header_final] + [price_final] + [deeplink_final] + [partner] + [location] + [Language])
                    connection.commit()

        cursor.close()
        connection.close()

trade_spider(int(Spider))

问题是信息无法进入数据库。此外,我没有收到任何错误消息。因此,我不知道我做错了什么。

你可以帮帮我吗?任何反馈都表示赞赏

1 个答案:

答案 0 :(得分:0)

删除声明是否有效? 我认为问题在于你传递变量的方式

更改语法如下:

sql_insert_tx =&#34; INSERT INTO euro_currencies(pk,货币,费率,日期)值(null,&#39; USD&#39;,&#39;%s&#39;,&#39;%s& #39)&#34; %(美国,日期)

cursor.execute(sql_insert_tx)