选择包含“ 1/8 $ 35”的元素将返回空字符串

时间:2019-03-24 07:44:16

标签: xpath scrapy lxml

我试图选择包含$ 35的跨度,如果该跨度不存在,我不希望返回空字符串。我尝试了几种不同的方法,但都失败了。

这是我到目前为止所取得的成就

price = response.xpath('//*[@id="content"]/div[4]/div/div[2]/div/div[2]/div[2]     
                        /ol/li/div/div/div/a/div/div[2]/div[2]/div/div
                       /div[contains(.,"1/8")]|//*[@id="content"]/div[4]
                     /div/div[2]/div/div[2]/div[2]/ol/li/div/div/div/a
            /div/div[2]/div[2]/div[1]/div[1][not(div[contains(.,"1/8")])]')

for p in price:
     p.xpath('.//following-sibling::span/text()')

HTML看起来像这样,有多个ol,每个ol都包含许多lis

 <li>
  <div class="sc-kAzzGY styled-components__FadeinWrapper-sc-45yec-29 cqYSfY">
    <div>
      <div>
        <a style="cursor:pointer" cursor="pointer" role="button" tabindex="0" href="/dispensaries/chill-dispensary/menu/la-confidential" class="styled-components__TouchableLink-sc-45yec-27 bVIjHi sc-EHOje cllnwZ sc-hqyNC jPTcOI">
          <div class="styled-components__MenuItemWrapper-sc-186ferk-0 iuHfFE sc-gzVnrw eJLYUk">
            <div class="styled-components__MenuAvatarContainer-sc-186ferk-1 hqZqql">
              <div class="styled-components__MenuAvatar-sc-186ferk-4 kkvwjj avatar__AvatarStyled-c9mhhj-0 kAtyaN" shape="rounded">
                <div class="styles__LazyImgWrapper-x7hlo8-0 bEVXVD">
                  <noscript><img style="width:auto;height:auto;border-radius:none" src="https://images.weedmaps.com/pictures/listings/699/945/838/square/23789926_91.jpeg" alt="" /></noscript>
                  <div><img alt="" style="width:auto;height:auto;border-radius:none" fit="fill" src="https://images.weedmaps.com/pictures/listings/699/945/838/square/23789926_91.jpeg?blur=500&amp;q=1&amp;fit=fill&amp;w=100&amp;h=100" srcSet="https://images.weedmaps.com/pictures/listings/699/945/838/square/23789926_91.jpeg?blur=500&amp;q=1&amp;w=100&amp;h=100&amp;dpr=1&amp;fit=fill 1x, https://images.weedmaps.com/pictures/listings/699/945/838/square/23789926_91.jpeg?blur=500&amp;q=1&amp;w=100&amp;h=100&amp;dpr=2&amp;fit=fill 2x, https://images.weedmaps.com/pictures/listings/699/945/838/square/23789926_91.jpeg?blur=500&amp;q=1&amp;w=100&amp;h=100&amp;dpr=3&amp;fit=fill 3x" /></div>
                </div>
              </div>
            </div>
            <div class="sc-bdVaJa styled-components__MenuDetailsContainer-sc-186ferk-3 hWqRwp sc-dVhcbM jzqdPt">
              <div class="sc-bdVaJa styled-components__CategoryName-sc-186ferk-2 bPQNrd sc-dVhcbM jlkqOh">
                <span class="styled-components__BrandCategory-sc-186ferk-5 SXkyA">Indica</span>
                <div class="sc-bdVaJa styled-components__Name-sc-186ferk-6 dPiQDX sc-dVhcbM jlkqOh">LA Confidential</div>
              </div>
              <div class="sc-bdVaJa hIWjea sc-dVhcbM gYyBEa">
                <div order="2,1" class="sc-bdVaJa gzoWJL sc-dVhcbM gJqxZa">
                  <div class="sc-bdVaJa styled-components__LabResultsWrapper-sc-186ferk-7 iOTUyO sc-dVhcbM jlkqOh"></div>
                </div>
                <div order="1,2" class="styled-components__ItemCardPrices-sc-186ferk-11 gDkxxY sc-bdVaJa gSurwb sc-dVhcbM gGLqqS">
                  <div order="2" class="sc-bdVaJa ounce styled-components__PriceType-sc-6ubro-0 juyeGj sc-dVhcbM gQAMWW">
                    <span class="styled-components__PriceAccessibilityTip-sc-6ubro-4 cpNHXt">prices by ounce</span>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 kaihgi sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh"><span aria-hidden="false">1/8</span></div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->35
                      </span>
                    </div>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 kaihgi sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh"><span aria-hidden="false">1/4</span></div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->70
                      </span>
                    </div>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 kaihgi sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh"><span aria-hidden="false">1/2</span></div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->140
                      </span>
                    </div>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 kaihgi sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh"><span aria-hidden="false">oz</span></div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->220
                      </span>
                    </div>
                  </div>
                  <div order="1" class="sc-bdVaJa gram styled-components__PriceType-sc-6ubro-0 juyeGj sc-dVhcbM cONtUi">
                    <span class="styled-components__PriceAccessibilityTip-sc-6ubro-4 cpNHXt">prices by gram</span>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 fCMWWc sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh">
                        1
                        <!-- -->g
                      </div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->12
                      </span>
                    </div>
                    <div class="sc-bdVaJa styled-components__PriceWrapper-sc-6ubro-2 fCMWWc sc-dVhcbM jlkqOh">
                      <div class="sc-bdVaJa styled-components__UnitLabel-sc-6ubro-1 VHQoO sc-dVhcbM jlkqOh">
                        2
                        <!-- -->g
                      </div>
                      <span class="styled-components__Price-sc-6ubro-3 jLflXl">
                        $<!-- -->24
                      </span>
                    </div>
                  </div>
                </div>
              </div>
            </div>
          </div>
        </a>
      </div>
    </div>
  </div>
</li>
<li>

以下是网站链接:https://weedmaps.com/dispensaries/chill-dispensary

The output i expect is this
['$', '35']
['$', '35']
['$', '35']
['$', '35']
['$', '35']
['$', '35']
['$', '35']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

But i get this
[]
['$', '35']
[]
['$', '35']
[]
['$', '35']
[]
['$', '35']
[]
['$', '35']
[]
['$', '35']
[]
['$', '35']
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]
[]

1 个答案:

答案 0 :(得分:0)

您要低于1/8的任何值吗?还是只值35美元?

任何值:

for price in response.xpath('//span[contains(text(),"1/8")]/parent::div'):
    price.xpath('./following-sibling::span/text()').getall()

仅当值是$ 35时:

for price in response.xpath('//span[contains(text(),"1/8")]/parent::div'):
    price.xpath('./following-sibling::span[contains(text()[2],"35")]/text()').getall()