将文件从Button链接下载到C驱动器上的特定文件夹

时间:2019-11-28 14:01:46

标签: c# selenium selenium-webdriver web-scraping

我正在抓取网页并导航到正确的位置,但是,作为整个c#世界的新手,我一直坚持下载pdf文件。

链接隐藏在此后面

var reportDownloadButton = driver.FindElementById("company_report_link");

它类似于:www.link.com/key/489498-654gjgh6-6g5h4jh/link.pdf

如何将文件下载到C:\ temp \?

这是我的代码:

using System.Linq;
using OpenQA.Selenium.Chrome;

namespace WebDriverTest
{
    class Program
    {
        static void Main(string[] args)
        {

            var chromeOptions = new ChromeOptions();
            chromeOptions.AddArguments("headless");

            // Initialize the Chrome Driver // chromeOptions
            using (var driver = new ChromeDriver(chromeOptions))
            {
                // Go to the home page
                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                // Get the page elements
                var userNameField = driver.FindElementById("loginForm:username");
                var userPasswordField = driver.FindElementById("loginForm:password");
                var loginButton = driver.FindElementById("loginForm:loginButton");

                // Type user name and password
                userNameField.SendKeys("username");
                userPasswordField.SendKeys("password");

                // and click the login button
                loginButton.Click();

                driver.Navigate().GoToUrl("www.link2.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

                var reportSearchField = driver.FindElementByClassName("form-control");

                reportSearchField.SendKeys("Company");

                var reportSearchButton = driver.FindElementById("search_filter_button");
                reportSearchButton.Click();

                var reportDownloadButton = driver.FindElementById("company_report_link");
                reportDownloadButton.Click();

编辑:

enter image description here


编辑2:

我不是Stackoverflow社区上最敏锐的笔。我不知道如何使用硒。我已经用

完成了
        var reportDownloadButton = driver.FindElementById("company_report_link");
        var text = reportDownloadButton.GetAttribute("href");
        // driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

        WebClient client = new WebClient();
        // Save the file to desktop for debugging
        var desktop = System.Environment.GetFolderPath(System.Environment.SpecialFolder.Desktop);
        string fileName = desktop + "\\myfile.pdf";
        client.DownloadFile(text, fileName);

但是网页似乎有点棘手。我正在

  

System.Net.WebException:'远程服务器返回错误:(401)   未经授权。'

调试器指向:

client.DownloadFile(text, fileName);

我认为它应该真正模拟右键单击并将链接另存为,否则此下载将无法进行。另外,如果我只单击按钮,它就会在新的Chrome标签中打开PDF。


EDIT3:

应该是这样吗?

using System.Linq;
using OpenQA.Selenium.Chrome;

namespace WebDriverTest
{
    class Program
    {
        static void Main(string[] args)
        {

    // declare chrome options with prefs
    var options = new ChromeOptionsWithPrefs();
    options.AddArguments("headless"); // we add headless here

    // declare prefs
        options.prefs = new Dictionary<string, object>
        {
            { "download.default_directory", downloadFilePath }
        };

    // declare driver with these options
    //driver = new ChromeDriver(options); we don't need this because we already declare driver below.

            // Initialize the Chrome Driver // chromeOptions
            using (var driver = new ChromeDriver(options))
            {
                // Go to the home page
                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                // Get the page elements
                var userNameField = driver.FindElementById("loginForm:username");
                var userPasswordField = driver.FindElementById("loginForm:password");
                var loginButton = driver.FindElementById("loginForm:loginButton");

                // Type user name and password
                userNameField.SendKeys("username");
                userPasswordField.SendKeys("password");

                // and click the login button
                loginButton.Click();

                driver.Navigate().GoToUrl("www.link.com");
                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);

                var reportSearchField = driver.FindElementByClassName("form-control");

                reportSearchField.SendKeys("company");

                var reportSearchButton = driver.FindElementById("search_filter_button");
                reportSearchButton.Click();

                driver.Manage().Timeouts().ImplicitWait = System.TimeSpan.FromSeconds(15);
                driver.Navigate().GoToUrl("www.link.com");

                // click the link to download
                var reportDownloadButton = driver.FindElementById("company_report_link");
                reportDownloadButton.Click();

                // if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
                var href = reportDownloadButton.GetAttribute("href");
                driver.Navigate().GoToUrl(href);

                }
            }
        }

    }
}

2 个答案:

答案 0 :(得分:2)

您可以使用WebClient.DownloadFile

答案 1 :(得分:2)

您可以尝试设置download.default_directory Chrome驱动程序首选项:

// declare chrome options with prefs
var options = new ChromeOptionsWithPrefs();

// declare prefs
    options.prefs = new Dictionary<string, object>
    {
        { "download.default_directory", downloadFilePath }
    };

// declare driver with these options
driver = new ChromeDriver(options);


// ... run your code here ...

// click the link to download
var reportDownloadButton = driver.FindElementById("company_report_link");
reportDownloadButton.Click();

// if clicking does not work, get href attribute and call GoToUrl() -- this may trigger download
var href = reportDownloadButton.GetAttribute("href");
driver.Navigate().GoToUrl(href);

如果reportDownloadButton是触发下载的链接,则文件应下载到您在filePath中设置的download.default_directory

这些线程都不在C#中,但是它们提到了类似的问题:

How to control the download of files with Selenium + Python bindings in Chrome

How to use chrome webdriver in selenium to download files in python?