Question

我今天晚上一直在挖掘堆栈溢出，但到目前为止没有任何工作。我想要实现的是从字符串（主要是HTML）中提取URL，该字符串最后没有任何图像扩展名。因此，如果给定的HTML字符串具有以.jpg结尾的URL，然后在没有任何图像扩展名的情况下向下几行到另一个URL，则正则表达式将获得第二个并停止。或者，它可以返回所有“好”的东西。网址，只是省略了图片。

到目前为止，我已经：

'<tr> <td style="vertical-align: top; padding-right: 12px;"><img src="http://static01.nyt.com/images/2016/01/31/us/why-iowaALT/why-iowaALT-thumbStandard.jpg" /></td> <td> <h6 style="font-size: 10px; font-weight: normal; text-transform: uppercase; color: ##000000; margin: 0; margin-bottom: 2px"></h6> <h1 style="font-weight: normal; font-family: georgia,"times new roman",times,serif; font-size: 23px; margin: 0; margin-bottom: 4px"><a href="http://p.nytimes.com/email/re?location=InCMR7g4BCJTYuyKqXu41s2MxgEX9Okc&amp;user_id=7b8478da99b24f28abb9c2f1be86c807&amp;email_type=eta&amp;task_id=1454290534529254&amp;regi_id=0" style="color: ##004276; text-decoration: none !important;">'

我知道图像检测部分应该在最后的某个地方，但到目前为止我只是设法冻结了服务器。

要匹配的示例字符串：

public FormB(string str)
{
  InitializeComponent();
  textBox1.Text = str;
}

注意：它应该是ColdFusion正则表达式版本，有时会有点限制

谢谢！

Answer 1

考虑一下你的代码在提取有效的html链接方面完全正常，你将它们存储在数组中。你所要做的就是通过这个数组，找到存储在这个数组中的任何url都不包含扩展名-if not，返回这个值。

_:-moz-tree-row(hover), body {
          width:1553px;
    }

Answer 2

您可以切换到Java，以实现您想要实现的目标：

<!--- Java Regular Expression Pattern Object --->
<cfset local.objPattern = createObject(
    "java",
    "java.util.regex.Pattern"
    ).compile(
        javaCast( "string", '(?:https?:\/\/)[^"]*(?<!\.jpg)(?=\")' )
        )
    />

<!--- Get Pattern Matcher for your html content --->
<cfset local.objMatcher = local.objPattern.matcher(
    JavaCast( "string", local.htmlContent )
    ) />

<!--- Find Matching URLs --->
<cfloop condition="local.objMatcher.find()">
    <cfdump var="#local.objMatcher.group()# <br>">
</cfloop>

ColdFusion正则表达式检测没有图像扩展名的第一个URL

2 个答案: