使用Scrapy来抓取嵌套的JSON数据?

时间:2016-04-01 22:17:35

标签: python json scrapy

我正在尝试编写一个从索尼PlayStation商店抓取信息的网络应用。我找到了包含我想要的数据的JSON文件,但我想知道如何使用Scrapy只存储JSON文件的某些元素?

这是JSON数据的一部分:

{
  "age_limit":0,
  "attributes":{
       "facets":{
          "platform":[
              {"name":"PS4™","count":96,"key":"ps4"},
              {"name":"PS3™","count":5,"key":"ps3"},
              {"name":"PS Vita","count":7,"key":"vita"},
          ]
       }
     }
    }

我只想要"计数" "名称"的价值PS4。我如何在Scrapy中获得这个?到目前为止,这是我的Scrapy代码:

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from crossbuy.items import PS4Vita


class PS4VitaSpider(BaseSpider):
    name = "ps4vita" # Name of the spider, to be used when crawling
    allowed_domains = ["store.playstation.com"] # Where the spider is allowed to     go
    start_url = "https://store.playstation.com/chihiro-api/viewfinder/US/en/999/STORE-MSF77008-9_PS4PSVCBBUNDLE?size=30&gkb=1&geoCountry=US"

    def parse(self, response):
        jsonresponse = json.loads(response)

        pass # To be changed later

谢谢!

2 个答案:

答案 0 :(得分:1)

...
def parse(self, response):
    jsonresponse = json.loads(response.body)
    my_count = None
    for platform in jsonresponse['attributes']['facets']['platform']:
        if 'PS4' in platform['name']:
            my_count = platform['count']

    yield dict(count=my_count)
...

答案 1 :(得分:0)

只需像访问python词典一样访问json数据:

    <!doctype html>
<html>
<head>
    <title>D3</title>
    <script src="https://d3js.org/d3.v3.min.js" charset="utf-8"></script>

</head>
<body>
    <script>

     var dataArray=[{"letter":"a", "value":20},
                {"letter":"b", "value":40}, 
                {"letter":"c", "value":50},
                {"letter":"m", "value":60}];
 console.log(dataArray)

 var width=500;
 var height=500;

 var widthScale=d3.scale.linear()
            .domain([0, 60])
            .range([0, width]);

 var heightScale=d3.scale.ordinal()
            .domain(dataArray.map(function(d){ return d.letter }))
            .range([0,height-125]);

 var letterScale=d3.scale.linear()
        .domain([0,26])
        .range(["A","B","c","d"])

 var color=d3.scale.linear()
            .domain([0,60])
            .range(["red", "blue"]);

 var x_axis=d3.svg.axis()
        .orient("bottom")
        .ticks(12)
        .scale(widthScale);

 var y_axis=d3.svg.axis()
        .orient("left")
        .ticks(4)
        .tickFormat(function (d) {return d})
        .scale(heightScale);

 var canvas=d3.select("body")
        .append("svg")
        .attr("width", width)
        .attr("height", height)
        .append("g")
        .attr("transform", "translate(25, 0)");

 var bars=canvas.selectAll("rect")
        .data(dataArray)
        .enter()
            .append("rect")
                .attr("width", function(d) {return widthScale(d.value);})   
                .attr("height", 50)
                .attr("fill", function(d) {return color(d.value)})
                .attr("y", function(d, i) {return i*100});
// bars.append("text")
//      .attr("x", function(d) {return widthScale(d)/2})
//      .attr("y", function(d) {return -heightScale(d)})
//      .style("stroke-width", 6)
//      .style("text-anchor", "middle")
//      .style("font-size", "34px")
//      .style("fill", "black")
//      .text(function(d) {return console.log(d);});

 canvas.append("g")
    .attr("transform", "translate(5, 375)")
    .call(x_axis);

 canvas.append("g")
    .attr("transform", "translate(5,0)")
    .call(y_axis)

    </script>
</body>
</html>