Question

我正在尝试从“ https://realtruck.com/p/rugged-ridge-floor-mats/”获取数据但是问题是他们改变了布局。现在我尝试获取下拉列表。

enter image description here

问题：列表很灵活，一次可能有5个列表，我们不知道有10个列表。所以我想根据下拉列表使用灵活的for循环。

这是我以前的代码：这是我想要的动态代码

 for year in years: 
      yield ... 
      make_arr  

      for make in make_arr:     
         yield ...      
         models_arr     

         for model in models_arr:           
           yield ...            
           body_arr 
            for body in body_arr:       
                yield ...
                colors_arr                  
                for color in colors_arr:    
                    yield ...

Answer 1

列表是动态的（年份除外），并且使用XHR调用填充数据。

HTTP POST https://uwp.thiecommerce.com/uwp-v3/rt/ordercontrols/rugged-ridge-floor-mats

帖子有效载荷包括您用户的先前选择。例子

{"year":2018,"makeSlug":"chevy"}

因此，您需要遍历这些年并开始调用XHR调用。在此循环内，您需要为每个子类别嵌套更多循环。

类似：

for years in [2017,2018] # more years goes here
    #get the year Make list
    for make in makes:
        #get the make Model list
        for model in models:
            # get the model body list
            for body in bodies:
                #  call https://uwp.thiecommerce.com/uwp-v3/rt/recs/similar/? 
                        makeSlug=chevy&modelSlug=silverado- 
                        1500&year=2017&bodySlug=crew-cab&productLineSlug=rugged- 
                        ridge-floor-mats
                # and you will have the data

使用Python Scrapy从“ https://realtruck.com/p/rugged-ridge-floor-mats/”获取数据

1 个答案: