Question

我提供了链接回不同文件类型的混合网址。我想使用RegEx删除与.pdf匹配的网址，但我不确定如何在不影响.html，.ppt .doc网址的情况下执行此操作39; S。

http://www.myurl.com/library/mydocument.doc
http://www.myurl.com/library/somefile.pdf

我尝试了在这里发布的不同示例，但它们是针对Java和C＃的，因此它们无效。

感谢您的帮助

编辑

我使用的是基于.NET的Nintex RegEx。我对C＃，Java，.NET等没有任何经验......

我目前正在从SharePoint中提取包含不同文件类型结尾的库URL。我能够弄清楚如何删除不需要的文件类型，但它仍然对我造成了问题。

这是我的以下布局

pattern = `.*pdf.*|.*pptx.*|`

Replacement =

问题是我得到空行的CRLF。然后我尝试了以下

pattern = `.*pdf.*|.*pptx.*|[\r\n]*`
Replacement =

问题是，一旦我添加删除CRLF，它就会将所有字符串放在一行中。

Answer 1

在.NET中过滤掉文件扩展名时，您可以使用Path.GetExtension。

示例：

using System.IO;
class Program
{
    static void Main(string[] args)
    {
        string[] files = new string[3]
        {
            "http://www.myurl.com/library/mydocument.doc",
            @"C:\files\somefile.pdf",
            "someotherfile.pdf",
        };

        List<string> filteredFiles = new List<string>(); 
        foreach (string file in files)
        {
            if (Path.GetExtension(file) != ".pdf")
            {
                filteredFiles.Add(file);
                Console.WriteLine(file);
            }
        }
        Console.Read();
    }
}

Answer 2

此模式仅适用于最终以pdfs和可能的空格结尾的http（s）。

string data = @"alphapdf
http://www.myurl.com/library/mydocument.doc
http://www.myurl.com/library/somefile.pdf
Gamma";

string pattern = @"http.+?\.pdf[\s\r\n]*";

通过包含要匹配的空格和CRLF的空格，使用.Net Regex.Replace将删除整行。

Regex.Replace(data, pattern, string.Empty)

结果：

alphapdf
http://www.myurl.com/library/mydocument.doc
Gamma

.NET RegEx删除以.PDF

2 个答案: