Scrapy - 使用for循环附加项目时获取重复项目

时间:2017-01-16 11:48:06

标签: json scrapy

我正在抓取JSON响应中的数据。使用for循环将所有数据提取到项目中,这是最后一条记录,重写此循环所有以前的记录。

这是我的代码:

let userInfo =  [
  {
    id: 'id1',
    users: [
      {
        name: 'userName1',
        job: 'userJob',
      },
      {
        name: 'userName2',
        job: 'userJob',
      }
    ]
  },
  {...}
];

let source = Rx.Observable.from(userInfo)
  .concatMap(group => {
    return Rx.Observable.from(group['users'])
      .map(user => {
        user['parent'] = group.id;
        return user;
      });
  });

source.subscribe(
  res => console.log(res)
);

我缺少什么?

2 个答案:

答案 0 :(得分:1)

你可以尝试在循环内部定义你的项目,而不是在它之外。

def parse_centers_and_ambulances(self, response):
    json_response = json.loads(response.body_as_unicode())
    facility = MedFacilityItem()
    facility["name"] = "Med Facility #1"
    centers = []
    # med_centers = MedCenterItem()  <-- this 
    for center in json_response:
      if center["name"].startswith("Center"):
        med_centers = MedCenterItem()  <-- should be here
        med_centers["response_url"] = center["product_id"]
        med_centers["name"] = center["name"]
        med_centers["address"] = center["name_short"] + "." +     
                                               center["adr_name"] + " " + 
                                               center["adr_dom"]
        med_centers["lat"] = center["latitude"]
        med_centers["lon"] = center["longitude"]
        med_centers["phoneInfo"] = [{"number": center["tel1"],
                                     "description": center["tel1_descr"]},
                                    {"number": center["tel2"],
                                     "description": center["tel2_descr"]}]
        centers.append(med_centers)

    facility["facility_type"] = centers
    return facility

答案 1 :(得分:1)

由于Scrapy项目基本上表现得像dicts,我将使用dicts来表示以下示例。考虑一下:

In [1]: dict_list = []
   ...: d = {}
   ...: for i in range(3):
   ...:     d['i'] = i
   ...:     dict_list.append(d)
   ...: print dict_list
   ...: print [id(e) for e in dict_list]
   ...:
[{'i': 2}, {'i': 2}, {'i': 2}]
[4557722520, 4557722520, 4557722520]

Dicts是可变对象,在这种情况下,您将相同的 dict实例多次附加到列表中。结果列表不包含不同的项,只有几个对同一个dict对象的引用。以下示例显示了相同的行为,将相同的dict三次附加到列表中,然后为其设置值:

In [2]: dict_list = []
   ...: d = {}
   ...: for i in range(3):
   ...:     dict_list.append(d)
   ...: d['some'] = 'value'
   ...: print dict_list
   ...:
[{'some': 'value'}, {'some': 'value'}, {'some': 'value'}]

您需要做的是通过在 for循环中初始化来创建不同的 dicts,如下所示:

In [3]: dict_list = []
   ...: for i in range(3):
   ...:     d = {}
   ...:     d['i'] = i
   ...:     dict_list.append(d)
   ...: print dict_list
   ...: print [id(e) for e in dict_list]
   ...:
[{'i': 0}, {'i': 1}, {'i': 2}]
[4557901904, 4557724760, 4557843264]