Question

152124687951<?xml version="1.0"><culo>Amazing</culo></Document>65464614

我必须只提取里面的XML代码。我可以有更多的XML代码，我需要逐个提取它。它始终以＆lt; / Document＆gt;开头。有人可以帮帮我吗？感谢...

Answer 1

鉴于您所描述的背景，preg_match可能不是最好的方法。以下可能更有效地满足您的要求，提供的XML示例在执行之前保存在 $ sXml 中：

$sXml = substr( $sXml, strpos( $sXml, '<?xml' ));
$sXml = substr( $sXml, 0,
  strpos( $sXml, '</Document>' ) + strlen( '</Document>' ));

Answer 2

如果您的字符串很大并且在＆＃34; XML＆＃34;之后和之前包含许多数据。部分，一个好的方法（性能）包括用strpos找到起始和结束偏移并提取后面的子串，例如：

$start = strpos($str, '<?xml ');
$end = strpos(strrev($str), '>tnemucoD/<');

if ($start !== false && $end !== false)
    $result = substr($str, $start, - $end);

如果你的字符串不是太大，你可以使用preg_match：

if (preg_match('~\Q<?xml \E.+?</Document>~s', $str, $m))
    $result = $m[0];

\Q....\E允许编写特殊字符（以正则表达式的意思），而不必转义它们。（在不提问题的情况下编写文字字符串很有用。）但请注意，在本示例中，只需要转义?。

Answer 3

您可以使用substr和strops来获取所需的所有匹配项。正则表达式比其他解决方案表现更差。因此，如果性能对您很重要，请考虑其他替代方案。

另一方面，性能可能不是问题（副项目，后台进程等），因此正则表达式是一种干净的工作方式。

从我的讲解中你会得到类似的东西：

152124687951<?xml version="1.0"><culo>Amazing</culo></Document>65464614
abc<?xml version="1.0"><culo>Amazing</culo></Document>abc
abc<?xml version="1.0"><culo>Amazing</culo></Document>abc
abc<?xml version="1.0"><culo>Amazing</culo></Document>abc

你想要提取所有的xml。所以一个完美的正则表达式将是：

@\<\?xml.+Document\>@

您可以在此处查看实时结果：http://www.regexr.com/39p9q 或者您可以在线测试：https://www.functions-online.com/preg_match_all.html

最后，$matches变量会有类似的内容（取决于您在preg_match_all中使用的漏洞：

array (
  0 => 
  array (
    0 => '<?xml version="1.0"><culo>Amazing</culo></Document>',
    1 => '<?xml version="1.0"><culo>Amazing</culo></Document>',
  ),
)

所以你可以迭代它，这就是全部。

关于表现，这是一个快速测试：

http://3v4l.org/B1t7h/perf#tabs

PHP中的preg_match（XML提取）

3 个答案: