Question

我需要一个简短的代码段来从HTTP服务器获取目录列表。

由于

Answer 1

代码之前的一些重要注意事项：

必须将HTTP Server配置为允许列出所需目录的目录;
由于目录列表是普通的HTML页面，因此没有标准来定义目录列表的格式;
由于考虑 2 ，您需要为每台服务器安装特定代码。

我的选择是使用正则表达式。这允许快速解析和定制。您可以为每个站点获取特定的正则表达式模式，这样您就可以采用非常模块化的方法。如果您计划使用新站点支持来增强解析模块而不更改源代码，请使用外部源将URL映射到正则表达式模式。

从http://www.ibiblio.org/pub/

namespace Example
{
    using System;
    using System.Net;
    using System.IO;
    using System.Text.RegularExpressions;

    public class MyExample
    {
        public static string GetDirectoryListingRegexForUrl(string url)
        {
            if (url.Equals("http://www.ibiblio.org/pub/"))
            {
                return "<a href=\".*\">(?<name>.*)</a>";
            }
            throw new NotSupportedException();
        }
        public static void Main(String[] args)
        {
            string url = "http://www.ibiblio.org/pub/";
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
            {
                using (StreamReader reader = new StreamReader(response.GetResponseStream()))
                {
                    string html = reader.ReadToEnd();
                    Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                    MatchCollection matches = regex.Matches(html);
                    if (matches.Count > 0)
                    {
                        foreach (Match match in matches)
                        {
                            if (match.Success)
                            {
                                Console.WriteLine(match.Groups["name"]);
                            }
                        }
                    }
                }
            }

            Console.ReadLine();
        }
    }
}

Answer 2

基本理解：

目录列表只是由Web服务器生成的HTML页面。每个Web服务器都以自己的方式生成这些HTML页面，因为Web服务器没有标准的方法来列出这些目录。

获取目录列表的最佳方法是简单地对您希望目录列表的URL执行HTTP请求，并尝试解析并从返回给您的HTML中提取所有链接。

要解析HTML链接，请尝试使用HTML Agility Pack。

目录浏览：

您要列出目录的Web服务器必须启用目录浏览才能在其目录中获取文件的HTML表示形式。因此，只有HTTP服务器希望您能够获取目录列表。

HTML Agility Pack的一个简单示例：

HtmlDocument doc = new HtmlDocument();
doc.Load(strURL);
foreach(HtmlNode link in doc.DocumentElement.SelectNodes("//a@href")
{
HtmlAttribute att = link"href";
//do something with att.Value;
}

清洁替代方案：

如果在您的情况下可行，更简洁的方法是使用目标列表的目标协议，如文件传输协议（FTP），SFTP（FTP之类的SSH）或FTPS（基于SSL的FTP）。

如果未启用目录浏览，该怎么办：

如果Web服务器没有打开目录浏览，则没有简单的方法来获取目录列表。

在这种情况下，您可以做的最好的事情是从给定的URL开始，按照同一页面上的所有HTML链接，并尝试根据这些HTML页面上资源的相对路径自行构建目录的虚拟列表。这不会为您提供Web服务器上实际文件的完整列表。

Answer 3

我刚刚修改过，发现这个最好

public static class  GetallFilesFromHttp
{
    public static string GetDirectoryListingRegexForUrl(string url)
    {
        if (url.Equals("http://ServerDirPath/"))
        {
            return "\\\"([^\"]*)\\\""; 
        }
        throw new NotSupportedException();
    }
    public static void ListDiractory()
    {
        string url = "http://ServerDirPath/";
        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
        {
            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {
                string html = reader.ReadToEnd();

                Regex regex = new Regex(GetDirectoryListingRegexForUrl(url));
                MatchCollection matches = regex.Matches(html);
                if (matches.Count > 0)
                {
                    foreach (Match match in matches)
                    {
                        if (match.Success)
                        {
                            Console.WriteLine(match.ToString());
                        }
                    }
                }
            }
            Console.ReadLine();
        }
    }
}

Answer 4

感谢您的精彩帖子。对我来说，下面的模式效果更好。

<AHREF=\\"\S+\">(?<name>\S+)</A>

我也在http://regexhero.net/tester进行了测试。

要在你的C＃代码中使用它，你必须在任何反斜杠之前添加更多的反斜杠（）和i的模式中的双引号

<AHREF=\\"\S+\">(?<name>\S+)</A>

nstance，在GetDirectoryListingRegexForUrl方法中你应该使用这样的东西

返回“＆lt; A HREF = \\”\ S + \\“＆gt;（？\ S +）”;

干杯！

Answer 5

当我无法访问ftp服务器时，以下代码适用于我：

public static string[] GetFiles(string url)
{
    List<string> files = new List<string>(500);
    HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
    using (HttpWebResponse response = (HttpWebResponse)request.GetResponse())
    {
        using (StreamReader reader = new StreamReader(response.GetResponseStream()))
        {
            string html = reader.ReadToEnd();

            Regex regex = new Regex("<a href=\".*\">(?<name>.*)</a>");
            MatchCollection matches = regex.Matches(html);

            if (matches.Count > 0)
            {
                foreach (Match match in matches)
                {
                    if (match.Success)
                    {
                        string[] matchData = match.Groups[0].ToString().Split('\"');
                        files.Add(matchData[1]);
                    }
                }
            }
        }
    }
    return files.ToArray();
}

但是，当我有权访问ftp服务器时，以下代码的运行速度要快得多：

public static string[] getFtpFolderItems(string ftpURL)
{
    FtpWebRequest request = (FtpWebRequest)WebRequest.Create(ftpURL);
    request.Method = WebRequestMethods.Ftp.ListDirectory;

    //You could add Credentials, if needed 
    //request.Credentials = new NetworkCredential("anonymous", "password");

    FtpWebResponse response = (FtpWebResponse)request.GetResponse();

    Stream responseStream = response.GetResponseStream();
    StreamReader reader = new StreamReader(responseStream);

    return reader.ReadToEnd().Split("\r\n".ToCharArray(), StringSplitOptions.RemoveEmptyEntries);
}

Answer 6

除非您想要的特定目录启用了目录列表且没有默认文件（通常是index.htm，index.html或default.html但始终可配置），否则不能这样做。只有这样，您才会看到目录列表，该目录列表通常会标记为HTML并需要解析。

Answer 7

您也可以将服务器设置为WebDAV。

C＃HttpWebRequest命令获取目录列表

7 个答案: