我的目标是运行一系列链接,这些链接被分类到类别列表中。这些类别列表可能都包含链接的副本,但链接与同一页面关联且是唯一的。
大约有240个类别,我希望大多数链接最多与其中的10个相关联。
脚本应使用实际的URL作为标识符来处理这些链接,并标记URL是否出现在给定的类别中。最终,获得这样的输出应该是微不足道的:
For a list = [item1, item2, ..., itemN],
item1 belongs to category1, category3,
item2 belongs to category 2, category5, category6,
etc.
当然,事实并非如此。该脚本目前在第一次发现时成功标记每个项目,然后在所有其他时间忽略它。我一直试图重组UPDATE线,但无济于事。我怀疑在我的WHERE声明中出了什么问题,但到目前为止,没有任何修改证明是成功的。
Python附于下方。作为参考,预期的SQL命令也位于下面的单独块中。
Link is a two-part list containing
1. The URL, and
2. The Name of the page the URL is for.
CleanTag() is a function that removes spaces from the category list's
title for use as a column name in the table.
DatabaseName is, probably inappropriately, a table in my SQLite database.
Tag is currently the column name in the table, which takes an Integer.
Database is the connection object for my SQLite database.
for ProviderLink in ProviderLinks:
URL = Link[0]
Name = Link[1]
Tag = CleanTag(Category)
try:
DBControl.execute('''INSERT OR ABORT INTO '''+DatabaseName+''' (providername, providerlink)
VALUES (?, ?);''',
(Name, URL))
LinkCount +=1
except sqlite3.IntegrityError:
pass
DBControl.execute('''INSERT OR IGNORE INTO '''+TagDatabaseName+''' (providerlink)
VALUES (?);''',
(URL,))
DBControl.execute('''UPDATE '''+TagDatabaseName+'''
SET '''+'`'+Tag+'`'+''' = 1
WHERE providerlink = '''+"'"+URL+"'"+''' ;
''')
Database.commit()
下面是预期的SQL命令,评论了我对每一行的意图:
INSERT OR ABORT INTO DatabaseName (providername, providerlink) VALUES (Name, URL);
-- adds entry, returns an exception if entry with same URL already in table.
-- using exception to track whether new entry was added.
INSERT OR IGNORE INTO TagDatabaseName (providerlink) VALUES (URL);
-- add entry to tag database if it isn't there; default value for all tags is 0
UPDATE TagDatabaseName SET Tag = 1 WHERE providerlink = URL;
-- update entry with appropriate value for Tag.
COMMIT
答案 0 :(得分:0)
确认。代码工作正常,但我认为链接是唯一的基本假设有点弱。即使相同的条目出现在多个类别列表中,它们实际上也是唯一的。
故事的道德:检查你的假设!