Question

我正在尝试删除网址字符串的一部分（协议，查询字符串等）

例如，以下输入字符串

$('[data-name]').text("John" );

将成为

https://www.example.com/xyz/page.html?id=10&name=smith
http://www.example.com/abc/index.html#
https://www.example.com/abc/
www.example.com/abc
example.com/abc
http://example.com/abc

这是我到目前为止所做的，

example.com/xyz/page.html
example.com/abc/index.html
example.com/abc
example.com/abc
example.com/abc
example.com/abc

我正在寻找一种更好的方法来执行此操作，也许使用单个string CleanUrl(string urlString) { urlString = Regex.Replace(urlString, @"^https?://", "", RegexOptions.IgnoreCase); urlString = Regex.Replace(urlString, @"^www\.", "", RegexOptions.IgnoreCase); urlString = Regex.Replace(urlString, @"#$", ""); urlString = Regex.Replace(urlString, @"/$", ""); return urlString; }或类似的方法

编辑：抱歉，我不清楚我的问题。我的输入字符串有时不包含协议和/或Regex.Replace部分，这在使用www.构造函数时会导致System.UriFormatException。我已经更新了示例输入。

Answer 1

我会用我在问题中的评论。

代码将如下所示：

   public string ReplaceUrl(string input)
    {
        Uri uri = new Uri(input);

        string uriWithoutQueryParams = uri.GetLeftPart(UriPartial.Path);

        string uriWithoutSchema = uriWithoutQueryParams.Replace(uri.GetLeftPart(UriPartial.Scheme), string.Empty);

        string uriWithoutTripleW = uriWithoutSchema.Replace("www.", string.Empty);

        string uriWithoutTrailingSlash = uriWithoutTripleW.TrimEnd(new char[] {'/' });

        return uriWithoutTrailingSlash;
    }

这也是您想要的测试方法（使用XUnit）

    [Theory]
    [InlineData("https://www.example.com/xyz/page.html?id=10&name=smith", "example.com/xyz/page.html")]
    [InlineData("http://www.example.com/abc/index.html#", "example.com/abc/index.html")]
    [InlineData("https://www.example.com/abc/", "example.com/abc")]
    public void MyUrlConverterReplacesCorrectly(string inputUrl, string expectedUrl)
    {
        string actualUrl = MyUrlConverter.ReplaceUrl(inputUrl);

        Assert.Equal(expectedUrl, actualUrl);
    }

Answer 2

请勿为此使用RegEx。相反，请使用Uri类来解析URL字符串，然后使用Host和AbsolutePath属性来获取最终的字符串：

var uri = new Uri("https://www.example.com/xyz/page.html?id=10&name=smith");
var result = uri.Host + uri.AbsolutePath;
if (result.EndsWith("/"))
    result = result.Remove(result.Length - 1, 1);
if (result.StartsWith("www."))
    result = result.Substring(4);

Answer 3

尝试一下：

        static string CleanUrl(string urlString)
        {
            urlString = Regex.Replace(urlString, @"\s+", "");
            urlString = Regex.Replace(urlString, @"^https?://", "", RegexOptions.IgnoreCase);
            urlString = Regex.Replace(urlString, @"^www\.", "", RegexOptions.IgnoreCase);
            urlString = Regex.Replace(urlString, @"(#|\?).*$", "");
            urlString = Regex.Replace(urlString, @"/$", "");
            return urlString;
        }

Answer 4

如果您所有的字符串都是url，而不必验证该结构，则对于示例数据，您可以使用替代方式来匹配要从url中删除的内容，并替换为空字符串。

<PropertyGroup> <DebugType>pdbonly</DebugType> </PropertyGroup>

说明

^(?:https?://www\.|https?://|www\.)?|(?:[#/]|\?.*)$声明字符串的开头，然后是可选的非捕获组，该组将匹配http和可选的s，再匹配：// www。或仅http：//部分或仅www。部分。
^(?:https?://www\.|https?://|www\.)?或
|匹配(?:[#/]|\?.*)$中的一个或匹配一个问号，并且将任意字符零次或更多次并声明字符串的结尾

Regex demo

C# demo

替换与模式匹配的字符串部分

4 个答案: