Question

如何只抓名字J.J.艾布拉姆斯，皮帕安德森在正则表达式上

 <header class="ipl-header">
        <div class="ipl-header__content">        
        <h4 name="producers" id="producers" class="ipl-header__content ipl-list-title">
            Produced by
        </h4>
</div>
        <a class="ipl-header__edit-link" href="https://contribute.imdb.com/updates?update=tt2527336:producers">Edit</a>
    </header>

    <table class="subpage_data spFirst crew_list">
        <tbody>
                    <tr class="even">
                        <td class="name">
                            <a href="/name/nm0009190/?ref_=tt_rv"
>J.J. Abrams</a>
                        </td>
                            <td>...</td>
                            <td>executive producer</td>
                    </tr>
                    <tr class="odd">
                        <td class="name">
                            <a href="/name/nm0027297/?ref_=tt_rv"
>Pippa Anderson</a>
                        </td>
                            <td>...</td>
                            <td>co-producer</td>
                    </tr>


                    </tbody>
                    </table>

尝试使用此代码，但无法正常工作......请帮我修复此问题。感谢

$arr['producers'] = $this->match_all_key_value('/<td class="name"><a.*?>(.*?)<\/a>/ms', $this->match('/Produced by<\/a><\/h4>(.*?)<\/table>/ms', $html, 1));
$arr['producers'] = array_slice($arr['producers'], 0, 5);

Answer 1

以下是一种可能的解决方案：

preg_match_all( "#<a href=\"/name/.*?>(.*?)</a>#is", $html, $results );
$arr['producers'] = array_pop( $results );
print_r( $arr['producers'] );

它正在寻找具有以/ name开头的引用的链接，然后抓取链接选项卡中的所有内容。这是假设页面上没有任何其他链接在路径中以/ name开头且不需要的引用。如果是这样，您可能必须将表达式的那一部分调整为更具体。

Answer 2

解析html实际上是PHP Simple HTML DOM Parser或DOMDocument等dom解析器的任务。 This answer解释了原因。

如果你想在正则表达式中执行此操作，另一个选项（运行PHP 5.2.4或更高版本时）可能正在使用正则表达式中的\K。

您可以做的是在您要查找的数据之前匹配。然后使用\K重置报告的匹配的起始点，匹配您要查找的数据，并使用正向前搜索结束锚标记。

<td class="name">\n\s+<a[^>]+>\K.*(?=<\/a>)

$pattern = "/<td class=\"name\">\n\s+<a[^>]+>\K.*(?=<\/a>)/";
preg_match_all($pattern, $html, $matches);

然后数组将在$matches[0]

中

解释

匹配<td class="name">
匹配新行\n
匹配一个或多个空格\s+
匹配<a
匹配不是>次或多次[^＆gt;] +
匹配>
然后使用\K
匹配。*任何字符零次或多次
积极向前看(?=)
断言以下内容是＆lt; / a＆gt; <\/a>
关闭正向前瞻)

Demo

如果没有\K，您可以在(.*)

等捕获组中捕获您的值

正则表达式看起来像：

<td class="name">\n\s+<a[^>]+>(.*)(?=<\/a>)

$pattern = "/<td class=\"name\">\n\s+<a[^>]+>(.*)(?=<\/a>)/";
preg_match_all($pattern, $html, $matches);

然后数组将在$matches[1]

中

Demo

如何使用正则表达式抓住这个？

2 个答案: