Question

我使用Selenium然后使用Beautiful Soup尝试抓取网页，该页面使用JavaScript加载某些内容。 Selenium给了我简单的html，我已经检查了这一点，使用print并发现它确实包含了我试图刮擦的部分。但我的问题是美丽的汤。

我想找到带

的div标签

comments = soup.find_all("div", class_="comment-detail")

我尝试过使用

data-selenium="reviews-comments"

但是这会返回空，可能因为实际的div标签也包含在其中

<div data-selenium="reviews-comments" class="comment-detail">

html中的确切标记是

comments = soup.find_all("div", data-selenium="reviews-comments", class_="comment-detail")

所以我尝试了以下内容，

SyntaxError: keyword can't be an expression

但这会产生错误

data-selenium

因为

dct = {
    'div': '',
    'data-selenium': 'reviews-comments',
    'class': 'comment-detail'

}
comments = soup.find_all(**dct)

当它实际上只是一个带连字符的单词时，

就像一个减法运算。我试过把它用引号括起来，但这没有用。

我也试过

len(comments)

但是

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html/')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source,'html.parser')

返回零，即注释为空。

为了清楚起见我的汤我有代码

Point p;
 listView.setOnItemClickListener(new OnItemClickListener() {
  @Override
  public void onItemClick(AdapterView<?> parent, View view,
    int position, long id) {

    showpopupwindows(Activity, p);
  }
});

任何想法如何在这里继续？

Answer 1

问题源于URL，您在末尾有一个额外的正斜杠，它返回404页面而不是您真正想要的页面。只需删除它，您的代码就可以正常工作。

以下是我用过的代码：

from selenium import webdriver  
from selenium.common.exceptions import NoSuchElementException  
from selenium.webdriver.common.keys import Keys  
from bs4 import BeautifulSoup

browser = webdriver.Firefox()  
browser.get('http://www.agoda.com/the-coast-resort-koh-phangan/hotel/koh-phangan-th.html')  
html_source = browser.page_source  
browser.quit()

soup = BeautifulSoup(html_source, 'html.parser')

comments = soup.find_all("div", class_="comment-detail")

print(comments)

美丽的汤 - 连字符关键字，错误::关键字不能是一个表达式

1 个答案: