c#从Kaggle登录和下载脚本

时间:2018-03-13 04:44:34

标签: c# webclient kaggle

最近,我遇到了一个python脚本,直接从Kaggle下载文件:https://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/

我正在尝试使用C#中的WebClients做类似的事情。我在StackOverFlow中得到了以下响应:C# download file from the web with login

尝试使用它,但我似乎只下载登录页面而不是实际文件。这是我的主要代码:

CookieContainer cookieJar = new CookieContainer();
CookieAwareWebClient http = new CookieAwareWebClient(cookieJar);

string postData = "name=<username>&password=<password>&submit=submit";
string response = http.UploadString("https://www.kaggle.com/account/login", postData);
Console.Write(response);

http.DownloadFile("https://www.kaggle.com/c/titanic/download/train.csv", "train.CSV");

我使用了上面链接中的Webclient扩展并稍加修改:

public class CookieAwareWebClient : WebClient
{
    public CookieContainer CookieContainer { get; set; }
    public Uri Uri { get; set; }

    public CookieAwareWebClient()
        : this(new CookieContainer())
    {
    }

    public CookieAwareWebClient(CookieContainer cookies)
    {
        this.CookieContainer = cookies;
    }

    protected override WebRequest GetWebRequest(Uri address)
    {
        this.Uri = address;
        WebRequest request = base.GetWebRequest(address);
        if (request is HttpWebRequest)
        {
            (request as HttpWebRequest).CookieContainer = this.CookieContainer;
        }
        HttpWebRequest httpRequest = (HttpWebRequest)request;
        httpRequest.AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Deflate;
        return httpRequest;
    }

    protected override WebResponse GetWebResponse(WebRequest request)
    {
        WebResponse r = base.GetWebResponse(request);
        var response = r as HttpWebResponse;
        if (response != null)
        {
            CookieCollection cookies = response.Cookies;
            CookieContainer.Add(cookies);
        }
        return response;
    }
}

想知道是否有人可以指出我哪里出错了?

感谢。

3 个答案:

答案 0 :(得分:2)

我们创建了一个论坛帖子,以帮助您完成您想要做的事情,Accessing Kaggle API through C#。如果您有其他问题,请随时在此处或论坛上发帖。

答案 1 :(得分:0)

尝试在未登录的情况下通过浏览器转到https://www.kaggle.com/c/titanic/download/train.csv,您的浏览器将打开该页面,而不是下载您的文件。您需要直接链接到文件而不是网页。

您的代码运行正常,您只需要直接链接到该文件或确保在下载文件之前已登录。

答案 2 :(得分:0)

我知道它并不完全是您所要求的,但Kaggle now has an official API可以用来下载数据。应该更容易使用。 :)