我正在尝试从我们公司创建的网站中提取内容。我在MSSQL Server for Scrapy数据中创建了一个表。我还设置了Scrapy并配置了Python来抓取和放大提取网页数据。我的问题是,如何将Scrapy抓取的数据导出到我的本地MSSQL Server数据库?
这是Scrapy提取数据的代码:
import scrapy
class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = [
'http://quotes.toscrape.com/page/1/',
'http://quotes.toscrape.com/page/2/',
]
def parse(self, response):
for quote in response.css('div.quote'):
yield {
'text': quote.css('span.text::text').extract_first(),
'author': quote.css('small.author::text').extract_first(),
'tags': quote.css('div.tags a.tag::text').extract(),
}
答案 0 :(得分:2)
您可以使用pymssql
模块将数据发送到SQL Server,如下所示:
import pymssql
class DataPipeline(object):
def __init__(self):
self.conn = pymssql.connect(host='host', user='user', password='passwd', database='db')
self.cursor = self.conn.cursor()
def process_item(self, item, spider):
try:
self.cursor.execute("INSERT INTO MYTABLE(text, author, tags) VALUES (%s, %s, %s)", (item['text'], item['author'], item['tags']))
self.conn.commit()
except pymssql.Error, e:
print ("error")
return item
此外,您需要在设置中将'spider_name.pipelines.DataPipeline' : 300
添加到ITEM_PIPELINES
dict。
答案 1 :(得分:0)
我认为最好的办法是将数据保存为CSV,然后将CSV加载到SQL Server表中。
len()
OR
import csv
import requests
import bs4
res = requests.get('http://www.ebay.com/sch/i.html?LH_Complete=1&LH_Sold=1&_from=R40&_sacat=0&_nkw=gerald%20ford%20autograph&rt=nc&LH_Auction=1&_trksid=p2045573.m1684')
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text)
# grab all the links and store its href destinations in a list
links = [e['href'] for e in soup.find_all(class_="vip")]
# grab all the bid spans and split its contents in order to get the number only
bids = [e.span.contents[0].split(' ')[0] for e in soup.find_all("li", "lvformat")]
# grab all the prices and store those in a list
prices = [e.contents[0] for e in soup.find_all("span", "bold bidsold")]
# zip each entry out of the lists we generated before in order to combine the entries
# belonging to each other and write the zipped elements to a list
l = [e for e in zip(links, prices, bids)]
# write each entry of the rowlist `l` to the csv output file
with open('ebay.csv', 'w') as csvfile:
w = csv.writer(csvfile)
for e in l:
w.writerow(e)