我需要废品this网站。
做出反应,看起来如此。然后,我尝试使用scrapy-splash提取数据。例如,我需要类 Widget _buildListItem(BuildContext context, Record record) {
return Card(
key: ValueKey(record.activityName),
elevation: 8.0,
margin: new EdgeInsets.symmetric(horizontal: 10.0, vertical: 6.0),
child: Container(
decoration: BoxDecoration(color: Color.fromRGBO(64, 75, 96, .9)),
child: ListTile(
contentPadding:
EdgeInsets.symmetric(horizontal: 20.0, vertical: 10.0),
title: Text(
record.activityName,
style: TextStyle(color: Colors.white, fontWeight: FontWeight.bold, fontSize: 23),
),
subtitle: Row(
children: <Widget>[
new Flexible(
child: new Column(
crossAxisAlignment: CrossAxisAlignment.start,
children: <Widget>[
RichText(
text: TextSpan(
text: "Activations: "+record.activations+
"\n"+record.dateCompleted,
style: TextStyle(color: Colors.white),
),
maxLines: 2,
softWrap: true,
)
],
)
)
],
),
trailing: Container(
child: Hero(
tag: "avatar_" + record.activityName,
child: CircleAvatar(
radius: 32,
backgroundImage: NetworkImage(record.icon),
backgroundColor: Colors.white,
)
)
),
onTap: () {
Navigator.push(
context, MaterialPageRoute(builder: (context) => new DetailPage(record: record)));
}
),
)
);
}
的“ a”元素。但是响应是一个空数组。我在大约5秒钟内使用了shelf-product-name
参数。
但是我只能得到一个空数组。
wait
答案 0 :(得分:1)
实际上,无需使用Scrapy Splash,因为所有必需的数据都以json格式存储在原始html响应的<script>
标签内:
import scrapy
from scrapy.crawler import CrawlerProcess
import json
class JumboCLSpider(scrapy.Spider):
name = "JumboCl"
start_urls = ["https://www.jumbo.cl/lacteos-y-bebidas-vegetales/leches-blancas?page=6"]
def parse(self,response):
script = [script for script in response.css("script::text") if "window.__renderData" in script.extract()]
if script:
script = script[0]
data = script.extract().split("window.__renderData = ")[-1]
json_data = json.loads(data[:-1])
for plp in json_data["plp"]["plp_products"]:
for product in plp["data"]:
#yield {"productName":product["productName"]} # data from css: a.shelf-product-name
yield product
if __name__ == "__main__":
c = CrawlerProcess({'USER_AGENT':'Mozilla/5.0'})
c.crawl(JumboCLSpider)
c.start()