Question

我有一个包含html源代码的字符串。在这个源中有许多网址，但我发现很难将它们与字符串的其余部分分开。我一直试图找到一种方法来获取所有文本（“http：”，“。jpg”），但没有成功找到一种方法，至少找到多个网址。你可能已经猜到我很久没有使用过C＃了。任何帮助将不胜感激。

源我的样本我试图从以下网址中提取网址：

<td class="rad">
                <input type="hidden" name="filenames[]" value="1270000_12_2.jpg">
                <a href="http://xxxxxxxxx/files/orders/120000/127200/12700000/Originals/1200000_12_2.jpg" target="_blank"><img src="http://xxxxxxxxxxxx/files/orders/120000/127200/120000/Originals/127000_12_2_thumb.jpg" border="0"></a>
                <br />
                120000_12_2.jpg                </td>
            <td class="rad" width="300" valign="top">
                <label>Enter comment to photographer:</label><br />
                <textarea rows="7" cols="35" name="comment[]"></textarea>
            </td>
            <td class="rad" width="300" valign="top">
                <label for="comment_from_editor">Comment from editor</label><br />
                <textarea rows="4" cols="35" name="comment_from_editor[]" id="comment_from_editor">

Answer 1

使用CsQuery或Html Agility Pack等HTML解析器获取A元素及其HREF属性。

D̻̻̤̜̪̜ơ͔ no͏̳̙t̸̳̤̭͓͍͍͈ ̵̬͚̤͔ú̟̜̹͈̰̞͇s̥͜e̴ ͚̹r̛̻͔̘̫̭̼é͚̼̹͎̞̯ge̢̤x

Answer 2

在C＃中

using System.Collections.Generic;
using System.Text.RegularExpressions;

    static string[] ParseLinkToJpg(string str)
    {
        Regex regex = new Regex(@"(http:.*?\.(.*?)).\s");
        Match match = regex.Match(str);
        List<string> result=new List<string>();
        while (match.Success)
        {
            if (match.Groups[2].ToString()=="jpg")
            result.Add(match.Groups[1].ToString());
            match = match.NextMatch();
        }
        return result.ToArray();
    }

此函数将返回一系列图像链接。

您可以将正则表达式(http:.*?\.(.*?)).\s更改为您需要的内容。

https://www.debuggex.com/是测试正则表达式的优秀服务。

从字符串中提取多个URL

2 个答案: