Question

我正在编写一段代码，该代码将定期在某些特定网站上访问，并获取少量数据供后期分析使用。为此，我正在使用ccs选择器，并且必须在html类和React-id之间进行选择。为了便于维护，我希望使用css选择器，希望它会减少更改的频率。那么哪个更适合此目的：html标记（类名称可以随时更改）或React-ids？

假设这是我的html文档：

<a class="BoxAll" reactid="11.0.7">
       <span class="BoxAllLabel" reactid="11.0.7.0"> Box1 </span>
      <span class="BoxAllLabel" reactid="11.0.7.1"> Box2 </span>
</a>

我有那些CSS选择器：

class_css = "a.BoxAll"
react_css = "a[reactid='11.0.7']"

使用哪种方法更好地解决可维护性问题？

PS：我正在Python和Selenium上开发BeautifulSoup

Answer 1

如果任何定位符包含数字，则应尽可能避免使用它。

如果将它与react id进行比较，我建议您使用class name。大多数情况下，它们是动态开发的，因此最好避免这种情况。

为此 HTML ：

<a class="BoxAll" reactid="11.0.7">
       <span class="BoxAllLabel" reactid="11.0.7.0"> Box1 </span>
      <span class="BoxAllLabel" reactid="11.0.7.1"> Box2 </span>
</a>

css选择器：

a.BoxAll  [for anchor tag]
a.BoxAll>span:first-child   [first span tag]  
a.BoxAll>span:nth-child(2)   [second span tag]

XPATH：

//a[@class='BoxAll']  
//span[contains(text(),'Box1')]  
//span[contains(text(),'Box2')]

最好将 css选择器与^ , * , $配合使用，以匹配前缀，包含和后缀

解析网站时最好依靠react-id或类吗？

1 个答案: