首先我创建了自己的db:
$ sqlite3 tdb
SQLite version 3.8.2 2013-12-06 14:53:30
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite>
然后我要创建一个表:
sqlite> CREATE TABLE myt (hostName CHAR(50) PRIMARY KEY, content TEXT, checked CHAR(5));
现在我有以下脚本:
import sys
import requests
from bs4 import BeautifulSoup
import sqlite3 as db
headings=['title','h1','h2','h3','h4','h5','p']
hosts=['microsoft.com','stackoverflow.com','google.com','yahoo.com']
con=db.connect('tdb')
for hostName in hosts:
cur=con.cursor()
cur.execute('SELECT hostName FROM myt WHERE hostName=? AND checked="YES"',[hostName])
data=cur.fetchall()
try:
if data[0][0]==hostName:
continue
except Exception, err:
pass
try:
session=requests.Session()
respons=session.get('http://%s'%hostName).content
except KeyboardInterrupt:
print
sys.exit()
try:
soup=BeautifulSoup(respons,'lxml')
for heading in headings:
tags=soup.find_all(heading)
for singleTag in tags:
output=singleTag.text
cur.execute('INSERT INTO myt (hostName,content,checked) VALUES (?,?,\'YES\')',[hostName,output])
print '\n [+] Content is captured!'
except Exception, err:
print '\n [-] Error: %s'%err
continue
但是当我第一次在我的电脑上运行代码时,每个网站都会出现以下错误:
UNIQUE constraint failed: myt.hostName
答案 0 :(得分:1)
您的hostName是主键,当您尝试两次插入主机名时会出现问题。
错误来自这一行:
cur.execute('INSERT INTO myt (hostName,content,checked) VALUES (?,?,\'YES\')',[hostName,output])
答案 1 :(得分:0)
for heading in headings:
tags=soup.find_all(heading)
for singleTag in tags:
output=singleTag.text
cur.execute('INSERT INTO myt (hostName,content,checked) VALUES (?,?,\'YES\')',[hostName,output]
对每个标题和每个标记执行此操作,并且每次插入相同的主机时都会执行此操作(google.com,...,YES)。
表中的主键必须是唯一的,不能多次插入相同的值