无法在HTML页面中获取标签的文本

时间:2019-09-12 20:40:12

标签: html dom dart

我正在尝试对应用程序进行Web抓取。我在获取HTML响应中的一个元素时遇到问题。 例如,我在HTML中有此元素:

<div class="cell card fr-news-box" style="flex-grow: 0;">
  <div class="card-section"><img src="https://fimgs.net/mdimg/perfume/m.53441.jpg"></div> 
  <div class="card-section">
    <p>
      <a href="https://www.fragrantica.com/perfume/Narciso-Rodriguez/Pure-Musc-For-Her-53441.html" target="_blank"> Pure Musc For Her<span class="link-span"></span></a>
    </p> 
    <p>
      <small>Narciso Rodriguez</small>
    </p>
  </div>
</div>

我正试图得到这个:

<p><small>Narciso Rodriguez</small></p>

但是我没有尝试工作。 这是我到目前为止的内容:

  Future initiate() async{
  var client = Client();
  Response response = await client.get(
    'https://www.fragrantica.com/search/'
  );

  var document = parse(response.body);
  List<Element> perfumes = document.getElementsByClassName('cell card fr-news-box');

  List perfumeImg = perfumes.map((element) => element.getElementsByTagName('img')[0].attributes['src']).toList();
  List perfumeLink = perfumes.map((element) => element.getElementsByTagName('a')[0].attributes['href']).toList();
  List perfumeName = perfumes.map((element) => element.getElementsByTagName('a')[0].text).toList();

  List perfumeBrand = perfumes.map((element) => element.getElementsByTagName('#main-content > div.grid-x.grid-margin-x > div.small-12.medium-8.large-9.cell > div > div > div > div.off-canvas-content.content1.has-reveal-left > div.grid-x.grid-padding-x.grid-padding-y > div > div:nth-child(3) > div > div > div > span > div:nth-child(1) > div:nth-child(2) > p:nth-child(2) > small')[0].text).toList();
}

问题出在perfumeBrand行中,每当我尝试运行此错误时都会崩溃:

RangeError (RangeError (index): Invalid value: Valid value range is empty: 0)

我尝试了其他几种解决方案,但没有任何效果,我只是无法获得此<p><small>XYZ</small></p> tag.

1 个答案:

答案 0 :(得分:0)

您可以使用方法querySelectorAll简化查询。
但是由于bug软件包中目前有html,所以不能使用“:nth-​​child”。

import 'package:html/parser.dart';

main() {
  var document = parse(_body);
  var perfumes = document.querySelectorAll('.cell .card-section p');

  print(perfumes[1].outerHtml);
}

String _body = '''
<div class="cell card fr-news-box" style="flex-grow: 0;">
  <div class="card-section"><img src="https://fimgs.net/mdimg/perfume/m.53441.jpg"> 
</div> 
 <div class="card-section">
   <p>
     <a href="https://www.fragrantica.com/perfume/Narciso-Rodriguez/Pure-Musc-For-Her-53441.html" target="_blank"> Pure Musc For Her<span class="link-span"></span></a>
   </p> 
   <p>
     <small>Narciso Rodriguez</small>
   </p>
 </div>
</div>
''';