访问小部件窗口beautifulsoup python机械化

时间:2017-06-05 19:23:59

标签: python html beautifulsoup mechanize

我正试图从这样的网站上删除信息:

https://www.glassdoor.com/Overview/Working-at-7-Eleven-EI_IE3581.11,19.htm

使用python + beautifulsoup + mechanize。

访问主站点上的任何内容都没有问题。但是,我还需要在点击"评级趋势"时出现的叠加窗口中显示的信息。带星星的酒吧旁边的按钮。

也可以使用url:

直接访问此叠加窗口

https://www.glassdoor.com/Reviews/7-Eleven-Reviews-E3581.htm#trends-overallRating

与此页面关联的html是对原始网站的HTML的修改。

但是,无论我尝试在该覆盖窗口网站上找到什么元素(通过findAll),beautifulsoup都会返回零点击。

我该如何解决这个问题?我尝试在访问网站和阅读任何内容之间增加一个休眠时间,但无济于事。

谢谢!

1 个答案:

答案 0 :(得分:0)

如果您使用的是Chrome浏览器,请选择该页面的背景(不显示其他信息),然后从上下文菜单中选择“检查”(对于Windows无论如何),然后选择“网络”标签,以便您可以看网络流量。现在点击“评级趋势”。标有'xhr'的条目将是https://www.glassdoor.ca/api/employer/3581-rating.htm?locationStr=&jobTitleStr=&filterCurrentEmployee=false&filterEmploymentStatus=REGULAR&filterEmploymentStatus=PART_TIME(我非常希望!),其内容将如下所示。

{"employerId":3581,"ratings":[{"hasRating":true,"type":"overallRating","value":2.9},{"hasRating":true,"type":"ceoRating","value":0.54},{"hasRating":true,"type":"bizOutlook","value":0.35},{"hasRating":true,"type":"recommend","value":0.4},{"hasRating":true,"type":"compAndBenefits","value":2.4},{"hasRating":true,"type":"cultureAndValues","value":2.5},{"hasRating":true,"type":"careerOpportunities","value":2.5},{"hasRating":true,"type":"workLife","value":2.4},{"hasRating":true,"type":"seniorManagement","value":2.3}],"week":0,"year":0}

这个网址是否可以更改用于获取其他雇主的信息,我很遗憾,我不能告诉你。