Question

我使用如下所示的file_get_contents获得了一些结果。

30049988.html" >Title1
297816.html" >Title2
2979922.html" >Title3
29736.html" >Title4
22833.html" >Title5

我想删除丑陋的部分（number.html＆＃34;＆gt;）并仅获取标题，我该如何实现？

Answer 1

您可以使用preg_replace功能。

preg_replace('~.*?>~', '', $string);

DEMO

.*?将执行零个或多个字符的非贪婪匹配。

OR

preg_replace('~^\d+\.html" >~', '', $string);

Answer 2

preg_replace方法可行，但要回答其他人想知道的原始问题。

<?php
$string = <<<EOF
30049988.html" >Title1
297816.html" >Title2
2979922.html" >Title3
29736.html" >Title4
22833.html" >Title5
EOF;
preg_match_all('~[^>]+>([^\\n]+)$~smU', $string, $matches);
if (!isset($matches[1])) {
  echo 'No results found ..'. PHP_EOL;
  exit;
}

foreach ($matches[1] as $match) {
  echo $match.PHP_EOL;
}

Answer 3

你试试这个正则表达式。

(?=T)(\w+)

如何运作

(?=T) - 这是一个积极的前瞻。它检查模式是否以T开头，然后才进行下一步。
(\w+) - 对T。

<强> 输出：

Title1
Title2
Title3
Title4
Title5

Here is the regex in action

正则表达式以记录开头，以数字开头并以特定字符串结尾

3 个答案: