我正在使用Python3.5构建网页抓取脚本。我在这里遇到的问题是这个网站。
Forbes.com强制用户在点击任何文章链接时查看启动页面。
这是启动页面网址。
前缀网址是自动嵌入的,所以我无法将其删除。 此外,我想到达... forbes.com/.../print/以便我可以抓取整篇文章,但网站重定向到页面而没有“/ print /".
当我尝试通过指定标记来摘录带有Xpath或Beautifulsoup的文章时,它将无效,因为脚本卡在此欢迎启动页面中。
import lxml.html
from selenium import webdriver
target_url = 'http://www.forbes.com/sites/julianmitchell/2016/09/27/this-startup-uses-drones-to-map-and-manage-massive-construction-projects/print/'
driver = webdriver.PhantomJS()
driver.get(target_url)
root = lxml.html.fromstring(driver.page_source)
content = str(root.xpath('//div[@class="body_inner"]/p[position() >= 1 and position() <= last()]/text()'))
print(content)
跳过欢迎页面的最佳方式是什么?以及我需要做什么才能达到...... / print / page?
答案 0 :(得分:1)
注意:我已经解释了点击跳过按钮的方法,只要它可以点击并且没有验证代码的lxml逻辑。
这可以是一种使用显式等待的方法,只要它可以点击就会点击该按钮
import lxml.html
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
target_url = 'http://www.forbes.com/sites/julianmitchell/2016/09/27/this-startup-uses-drones-to-map-and-manage-massive-construction-projects/print/'
driver = webdriver.Chrome("pathtochromedriver\chromedriver.exe")
driver.get(target_url)
driver.maximize_window()
wait = WebDriverWait(driver, 5)
skipbutton = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="navigation"]/div/a')))
skipbutton.click()
root = lxml.html.fromstring(driver.page_source)
content = str(root.xpath('//div[@class="body_inner"]/p[position() >= 1 and position() <= last()]/text()'))
print(content)
<强>更新强>
借助此代码,您将获得文章的标题和段落。我正在使用beautifulsoup ......
from selenium import webdriver
from selenium.webdriver.common.by import By
import time
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs4
target_url = 'http://www.forbes.com/sites/julianmitchell/2016/09/27/this-startup-uses-drones-to-map-and-manage-massive-construction-projects/print/'
driver = webdriver.Chrome()
driver.get(target_url)
driver.maximize_window()
try:
wait = WebDriverWait(driver, 5)
skipbutton = wait.until(EC.element_to_be_clickable((By.XPATH, '//*[@id="navigation"]/div/a')))
skipbutton.click()
time.sleep(3)
except:
print "Continue Button not present"
pSource= driver.page_source
soup = bs4(pSource, "html.parser")
ArticleTitle = soup.find("h1",{"itemprop":"headline"})
print "The title of Article is : " + ArticleTitle.text
Article = soup.find("div",{'itemprop':"articleBody"})
Articlebody=Article.findAll("p")
for a in Articlebody:
print a.text
这是输出:
The title of Article is : This Startup Uses Self-Flying Drones To Map And Manage Construction sites
This is a modal window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
Robots replacing humans in the workforce is no longer a futuristic theory reserved for sci-fi thrillers and chats amongst alleged cyber geeks. Startups across industries are increasingly using automation and artificial intelligence to take on roles such as developing software, mining insights, curating content and managing marketing efforts.
Automation is proven to be an effective method of removing human workers from seemingly menial or repetitive tasks, empowering employees to focus on aspects of the business that require more specialized skills. This approach has shown to boost efficiency by allowing startups to scale quicker, trim budgets and consolidate their employee count.
A self-flying Boomerang Drone being used to map a construction site. (Photo courtesy of Identified Technologies)
Today’s tech-driven workforce is a direct reflection of a tech-driven culture. The digital era has swiftly transitioned from websites and mobile apps into high-tech hardware, wearable technology and the ability to experience alternative realities. One example of this integrated shift is the adoption of drones. Drones are expected to have an economic impact that exceeds $13.6 billion, while expecting to gross upward of $82 billion and create over 100,000 jobs by 2025.
Drones first surfaced as aerial combat machines acquired for military use. According to their 2017 Fiscal Year budget, the United States military will spend $4.61 billion on drones, aiming to purchase 31 unmanned aerial systems. Expanding beyond military use, drones quickly became free-range, high-definition cameras embraced by the new wave of photographers and filmmakers. Fast forward, drones are being used for everything from delivering burritos to flying into deadly storms. Now, a startup is using drones with a vision to revolutionize the construction industry.
This is a modal window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
Identified Technologies uses self-flying drones to map out and manage large-scale construction projects. Using physical labor, the mapping process commonly exceeds a month, from planning to execution. However, by using their advanced drone technology, the entire mapping process can be completed within minutes, allowing the process to be repeated and tracked daily without extending timelines or exhausting budgets. Other aspects of the mapping process include pre-flight planning, post-flight analysis and detailed reporting.
In addition to dramatically reducing the length of a project, drones massively reduce the likelihood of errors, producing physical maps that delivery thorough depictions of a space, capturing very intricate details. This allows Identified Technologies to complete projects with better precision than human workers. With the FAA recently modifying laws regarding drone piloting, thier technology will now become ten times more accessible, causing a rapid expansion of the drone mapping industry.
I spoke with Dick Zhang, CEO of Identified Technologies, about the vision behind his company, pioneering a new industry and how drones are continuing to disrupt the modern economy.
Every great company solves a problem or fills a void — what opportunity did you discover and how did the initial idea evolve into what Identified Technologies is today?
Dick Zhang: I came away from a drone demonstration at the University of Pennsylvania’s GRASP Lab with an immediate sense of the potential uses for this new technology. I attached a high-resolution camera to a drone and began gathering data and experimenting with business models. It soon became apparent that the construction industry was a particularly ripe target for continuous project tracking technology. Our customers had been stuck with antiquated methods of gathering data and making decisions that led to extremely painful surprises, reworking, costs, and delays. It wasn’t their fault, at the time, there was no alternative. You can’t make smart decisions without accurate data and you couldn’t cost-effectively capture the necessary data without new technologies like aerial mapping drones.
How much of the appeal or value of the technology is found in the actual drone versus the other key elements associated with it?
Dick Zhang: The actual drone is just a small part of generating business value with the technology. Pre-flight planning, camera and sensor settings, post-flight analysis and quality control, in addition to reporting and analytics are all needed to make the captured data useful. Until recently, there were only fragmented ways of cobbling that workflow together. That problem led to the development of our eeDaaS model, which stands for end-to-end drone as a Service, where we handle every aspect of the workflow with our fully integrated hardware/software system. Clients don’t want their technology vendors pointing fingers at each other when it breaks, they just want it to work. I believe the stress-free eeDaaS workflow is the biggest reason industry leaders choose us over alternatives.
What is the makeup of an operating team and how many drones does it usually take to fulfill the average construction project?
Dick Zhang: A typical operating team consists of just one certified user. Under the new FAA Part 107 Rules, there no longer needs to be a licensed commercial pilot, they only need an easily obtainable Remote Pilot Certificate. To do a manned survey of a 100-acre site using traditional methods might have taken a month. Beyond the time, and cost, it put people in danger on hazardous construction sites, and if anything with the project isn’t on track, you won’t find out until you get your results back, when a month of labor, time and money have already been sunk into it. In contrast, a 100-acre construction site requires just 9 minutes and a single Boomerang drone to capture the data automatically. The pre-flight preparations, take off, data capture flight, landing, post-flight processing, post-flight data analysis, reporting, and storage are all done by the system for you. That is the beauty of the eeDaaS model.
Since using drones for construction mapping is new and drone regulations continue changing — What have been the biggest challenges you’ve faced thus far and what obstacles do you expect to face going forward?
Dick Zhang: Our biggest challenge was balancing our aggressively advancing commercial drone technology with conservatively adopting regulatory changes. In 2014 and 2015, it was unclear what direction and pace the FAA would adopt. I give huge credit to the FAA, with the new Part 107 rules they have updated their policies to ensure that appropriate safety controls are in place, while removing the headaches and hurdles that were needlessly holding commercial adoption back. Now, with the 107, there is no reason for companies not to take advantage of drones and the savings, speed and safety they bring. We have wanted to bring transparency to construction industry workflows with Site IQ for years, but the market was not ready for it initially.
Your technology and approach have already proven to make the construction processes quicker, more efficient and more cost effective — What do you see your company, and this technique evolving into?
Dick Zhang: Our goal is to bring complete transparency to the traditionally opaque construction workflow process. We use big data to bring big insights to big jobs. Even if you are the world’s best project manager, it’s still going to be impossible to finish on time and on budget if you can’t see your progress. We have found drones to be a total game changer for construction site mapping and analysis. For the first time in history, managers are getting the accurate data they need to make fast informed decisions, that’s why our slogan is “Know when others guess”. Beyond saving time and money, builders can now see details, progress, and trends that were previously impossible to detect.
我在这段代码中看到的唯一问题是代码是输出中有三行,即
This is a modal window.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
This is a modal window. This modal can be closed by pressing the Escape key or activating the close button.
这些都是因为他们的标签也是&#39; p&#39; .....这是您问题的临时解决方案,将尝试纠正此问题并在此处更新