Python SQLite更新列

时间:2017-04-26 08:56:05

标签: python sql sqlite

我正在尝试根据已填充的artist_title列更新album_title列。

我可以使用循环中的最后一个album_title重新整理album_title列: 用于专辑中的标记:

for album in tag:
    cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album, ))

    for artist in artists:
        artist = artist.string          
        cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist, ))        
        cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist))

或者我只能使用正确的album_title更新最后一行。

 for tag in albums:

    for album in tag:
        cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album, ))

        for artist in artists:
            artist = artist.string          
            cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist, ))

        cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist))

我理解为什么会出现这些问题,但我无法弄清楚如何实现我想要的 - 每一行都更新了正确的专辑名称。 album_title名称将始终与artist_name处于相同的顺序。

我已经看到更新列在这里被广泛讨论,但由于我自己纠结的独特for循环,我无法解决这个问题。 如果我的问题是因为我的数据检索结构很差,我会很高兴听到如何修复它。

整个代码:

from urllib.request import Request, urlopen
from urllib.parse import urlparse
from urllib.parse import urljoin
from bs4 import BeautifulSoup

import urllib.error
import sqlite3
import json
import time
import ssl


#connect/create database
conn = sqlite3.connect('pitchscraper.sqlite')
#create way to talk to database
cur = conn.cursor()

#create table
cur.execute('''
    CREATE TABLE IF NOT EXISTS Master (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE, artist_name TEXT UNIQUE)''')

cur.execute('''
    CREATE TABLE IF NOT EXISTS Albums (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, album_title TEXT UNIQUE)''')

cur.execute('''
    CREATE TABLE IF NOT EXISTS Artists (id INTEGER NOT NULL PRIMARY KEY AUTOINCREMENT UNIQUE, artist_name TEXT UNIQUE, album_title TEXT, FOREIGN KEY(album_title) REFERENCES Albums(album_title))''')



#open and read page
req = Request('http://pitchfork.com/reviews/albums/?page=1', headers={'User-Agent': 'Mozilla/5.0'})
pitchpage = urlopen(req).read()


#parse with beautiful soup
soup = BeautifulSoup(pitchpage, "lxml")
albums = soup('h2')
artists = soup.find_all(attrs={"class" : "artist-list"})


for tag in albums:

    for album in tag:
        cur.execute('INSERT OR IGNORE INTO Albums (album_title) VALUES (?)', (album, ))

        for artist in artists:
            artist = artist.string          
            cur.execute('INSERT OR IGNORE INTO Artists(artist_name) VALUES (?)', (artist, ))        
            cur.execute('UPDATE Artists SET album_title=? WHERE artist_name=?', (album, artist))


print()


conn.commit()

输出失败:

+------+-------------------------------------------+-------------+
|  id  |                artist_name                | album_title |
+------+-------------------------------------------+-------------+
| "1"  | "Sylvan Esso"                             | "Odd Hours" |
| "2"  | "Mew"                                     | "Odd Hours" |
| "3"  | "Tara Jane O’Neil"                        | "Odd Hours" |
| "4"  | "Real Life Buildings"                     | "Odd Hours" |
| "5"  | "Bruce Springsteen and the E Street Band" | "Odd Hours" |
| "6"  | "Ravyn Lenae"                             | "Odd Hours" |
| "7"  | "Tee Grizzley"                            | "Odd Hours" |
| "8"  | "Shugo Tokumaru"                          | "Odd Hours" |
| "9"  | "Woods"                                   | "Odd Hours" |
| "10" | "Formation"                               | "Odd Hours" |
| "11" | "Valgeir Sigurðsson"                      | "Odd Hours" |
| "12" | "Caddywhompus"                            | "Odd Hours" |
+------+-------------------------------------------+-------------+

期望的输出:

+------+-------------------------------------------+-------------------------------+
|  id  |                artist_name                |          album_title          |
+------+-------------------------------------------+-------------------------------+
| "1"  | "Sylvan Esso"                             | "What Now"                    |
| "2"  | "Mew"                                     | "Visuals"                     |
| "3"  | "Tara Jane O’Neil"                        | "Tara Jane O'Neil"            |
| "4"  | "Real Life Buildings"                     | "Significant Weather"         |
| "5"  | "Bruce Springsteen and the E Street Band" | "Hammersmirth Odeon, London"  |
| "6"  | "Ravyn Lenae"                             | "Midnight Moonlight EP"       |
| "7"  | "Tee Grizzley"                            | "My Moment"                   |
| "8"  | "Shugo Tokumaru"                          | "TOSS"                        |
| "9"  | "Woods"                                   | "Love is Love"                |
| "10" | "Formation"                               | "Look at the Powerful People" |
| "11" | "Valgeir Sigurðsson"                      | "Dissonance"                  |
| "12" | "Caddywhompus"                            | "Odd Hours"                   |
+------+-------------------------------------------+-------------------------------+

1 个答案:

答案 0 :(得分:0)

albums = soup('h2')
artists = soup.find_all(attrs={"class" : "artist-list"})

问题是artists列表包含所有艺术家。

您必须从每张专辑中提取循环内的艺术家列表。