我看过StackOverflow上的几篇文章,但仍然不太了解如何做到这一点。在Scrapy中,我从一个URL上抓取书籍。对于要抓取的书籍的每条记录,我想将其传递到另一个网站的搜索字段,并从该网站中获取特定元素。但是,Scrapy似乎停留在第一个网站上,而没有从第二个网站检索结果。 (我基本上是在Scrapy中尝试复制使用Selenium可以轻松完成的操作-使用Selenium&BS则要慢一些。)
import scrapy
import pandas as pd
from datetime
import datetime
from timeit
import default_timer as timer
from fake_useragent
import UserAgent
start = timer()
d1 = datetime.now()
book = []
country = []
stmp = []
items = []
today = datetime.now()
tt = today.strftime('%Y-%m-%d_%H_%M_%S')
class CoinsSpider(scrapy.Spider):
name = "proxies"
custom_settings = {
'DOWNLOAD_DELAY': 3,
'CONCURRENT_REQUESTS_PER_DOMAIN': 3,
'HTTPCACHE_ENABLED': True,
'DOWNLOADER_MIDDLEWARES': {
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware': None,
'scrapy_fake_useragent.middleware.RandomUserAgentMiddleware': 400,
},
'DEFAULT_REQUEST_HEADERS': {
'Referer': 'http://www.google.com'
}
}
def start_requests(self):
url = "https://www.book.net/"
yield scrapy.Request(url = url, callback = self.parse)
def parse(self, response):
for row in response.css("#booklisttable > tbody:nth-child(1) tr"):
b = row.css('.title::text')[0].extract()
book.append(b)
try:
request = scrapy.Request('https://www.searchbook.com/international',
callback = self.parse_gua,cb_kwargs = dict(main_url = response.url))
yield request
print(response.headers)
print(response.css)
check = response.css('.fc-today__dayofmonth::text').extract()
print(check)
except:
pass
crts = row.css('.country::text')[0].extract()
country.append(crts)
tstmp = str(d1)
stmp.append(tstmp)
item = {
"Title": book,
"Country": country,
"Timex": stmp,
}
test_df = pd.DataFrame.from_dict(item, orient = 'columns').replace('\n', '', regex = True)
test_df['Joined'] = test_df['Title'] + ':' + test_df['Country']
items.append(test_df)
result = pd.concat([pd.DataFrame(items[i]) for i in range(len(items))], ignore_index = True)
with open('my_books_' + tt + '.csv', 'a', newline = '') as f:
result.to_csv(f)
print("Completed")
end = timer()
elapse = end - start
print("It Took" + str(elapse))
很抱歉,我的问题是基本问题,但是如果有人可以指向文档中的链接,或者是一个很好的示例!