这是HTML代码,您可以看到有两个标签,即<code>, <img>
。
现在,我要重点关注的是,当您向右滚动 little 时,您会在code
标签之后看到一个img
标签。
现在主要的问题是,我想要所有代码标签,为此我正在使用bs4,但是我可以获得紧接在图像标签之后的代码标签。我不知道为什么?有什么主意吗?
<code style="display: none" id="bpr-guid-1535430">
{"data":{"mediaConfig":{"mprConfig":{"sizes":[{"width":60,"height":30,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":60,"height":36,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":90,"height":45,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":90,"height":54,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":50,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":100,"height":100,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":120,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":120,"height":72,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":127,"height":30,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":127,"height":46,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":150,"height":75,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":150,"height":90,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":191,"height":45,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":191,"height":69,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":100,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":120,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":200,"height":200,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":254,"height":60,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":254,"height":92,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":337,"height":120,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":400,"height":400,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":506,"height":180,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":674,"height":240,"$type":"com.linkedin.voyager.common.MediaProcessorSize"},{"width":750,"height":750,"$type":"com.linkedin.voyager.common.MediaProcessorSize"}],"filters":{"cover":"https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}","contain":"https://media.licdn.com/mpr/mpr/shrinknp_{width}_{height}{+id}","original":"https://media.licdn.com/media{+id}","fill":"https://media.licdn.com/mpr/mpr/shrink_{width}_{height}{+id}","$type":"com.linkedin.voyager.common.MediaProcessorFilters"},"$type":"com.linkedin.voyager.common.MediaProcessorConfig"},"$type":"com.linkedin.voyager.common.MediaConfig"},"$type":"com.linkedin.voyager.common.Configuration"},"included":[]}
</code>
<img src="" style="display: none" class="datalet-bpr-guid-1535430"><code style="display: none" id="bpr-guid-1535431">
{"data":{"canBrowseProfiles":false,"reactivationFeaturesEligible":false,"canViewJobAnalytics":false,"canViewWVMP":false,"premiumFreeTrialEligible":true,"canViewCompanyInsights":false,"$type":"com.linkedin.voyager.premium.FeatureAccess"},"included":[]}
</code>
<code style="display: none" id="datalet-bpr-guid-1535431">
{"request":"/voyager/api/premium/featureAccess?name\u003DreactivationFeaturesEligible","status":200,"body":"bpr-guid-1535431"}
</code>
<img src="" style="display: none" class="datalet-bpr-guid-1535431"><code style="display: none" id="bpr-guid-1535432">
{"data":{"companies":[],"$deletedFields":["paidProducts","postJobsEnabled"],"memberGroup":"FREE","showStaticLearning":false,"$type":"com.linkedin.voyager.common.Nav","$id":"M8x5UY0Zt6eGdBCiy+iKhA==,root"},"included":[]}
</code>
<code style="display: none" id="datalet-bpr-guid-1535432">
{"request":"/voyager/api/nav","status":200,"body":"bpr-guid-1535432"}
</code>
下面是我在python中使用的代码。
h = HTMLParser()
companyname = sys.argv[1]
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:63.0) Gecko/20100101 Firefox/63.0',
}
url = 'https://www.linkedin.com/search/results/all/?keywords='+companyname+'&origin=GLOBAL_SEARCH_HEADER'
req = requests.get(url, headers=headers)
finding = BeautifulSoup(req.content, 'lxml')
for x in finding.findAll('code'):
print x