Question

我正在尝试格式化字符串数组，以去除这样的语句：

*<span class="exception">some text</span>

这些数组中的许多都是十进制数字，但是有几个实例包含html标记/文本，例如上述内容。以下是数组中的一些示例项目，可帮助您深入了解它们：

'1.5',
'3.7',
'8.0',
'4.2*<span class="exception">some text</span>'
'5.7*<span class="exception">some text</span>random text to keep'
'4.9*<span class="exception">some text</span>8.0'

当我遇到带有“ *某些文本”的项目时，我需要完全删除星号，开始和结束span标签以及标签中的文本。标签内的文本是完全随机的。其他文本可能会在span标签之后，在这种情况下，我需要保留该文本。

我已经签出了几篇文章，包括以下内容（到目前为止最有帮助），但只有部分成功：Regex to remove span tags using php

if (substr_count($value, '*<span') > 0) {
  $value = preg_replace('/<span[^>]+\>/', '', $value);
}

此语句去除星号和开始跨度标签，但不去除结束跨度标签或标签之间的文本。

我对regex还是很陌生，因此可以对任何帮助或建议表示赞赏。

Answer 1

如果一切都遵循这种模式，则不需要正则表达式，只需在*上爆炸并使用第一个元素即可。

foreach( $array as $key => $value ){
  $array[$key] = explode('*',$value)[0];
}

示例结果：

array(4) {
  [0]=>
  string(3) "1.5"
  [1]=>
  string(3) "3.7"
  [2]=>
  string(3) "8.0"
  [3]=>
  string(3) "4.2"
}

编辑如果标签后面有“其他内容”，则需要做更多的工作

$array = [
  '1.5',
  '3.7',
  '8.0*<span class="exception">some text</span>',
  '4.2*<span class="exception">some text</span>then other stuff'
];

foreach( $array as $key => $value ){
  $sub = explode('*',$value);
  $end = [];
  if(count($sub) > 1) {
    $end = explode('>',end($sub));
  }
  $array[$key] = trim($sub[0] . ' ' . end($end));
}

结果：

array(4) {
  [0]=>
  string(3) "1.5"
  [1]=>
  string(3) "3.7"
  [2]=>
  string(3) "8.0"
  [3]=>
  string(20) "4.2 then other stuff"
}

Answer 2

应该是.. [*]匹配*字符，而。*>匹配直到>字符的所有

 if (substr_count($value, '*<span') > 0) {
      $value = preg_replace('/[*].*>/', '', $value);
    }

Answer 3

您可以简单地捕获意外HTML的所有组件，然后用简单的表达式替换为所需的任何内容，例如：

([0-9.]+)(.+?)<(.+?)>(.+?)<(\/.+?)>

在这里，([0-9.]+)捕获$1中的数字，然后在$2，(.+?)中捕获*，然后在$3，{{1}中打开标签}，<(.+?)> $4中的textConent和(.+?)，$5中的结束标记，如果我们希望捕获其他内容，则可以对其进行修改。

测试

<(\/.+?)>

Demo

Answer 4

Do not parse HTML with regex。在您的情况下，请使用适当的HTML解析器

$arr = array(
    '1.5',
    '3.7',
    '8.0',
    '4.2*<span class="exception">some text</span>',
    '5.7*<span class="exception">some text</span>random text to keep',
    '4.9*<span class="exception">some text</span>8.0',
);

foreach ($arr as &$tmp) {
    $domd = @DOMDocument::loadHTML('<?xml encoding="UTF-8"><main>' . $tmp . '</main>');
    $main = $domd->getElementsByTagName("main")->item(0);
    foreach ($main->getElementsByTagName("*") as $remove) {
        $remove->parentNode->removeChild($remove);
    }
    $tmp = str_replace("*", " ", $main->textContent);
}

print_r($arr);

收益：

Array
(
    [0] => 1.5
    [1] => 3.7
    [2] => 8.0
    [3] => 4.2 
    [4] => 5.7 random text to keep
    [5] => 4.9 8.0
)

Answer 5

$value = ['1.5',
'3.7',
'8.0',
'4.2*<span class="exception">some text</span>',
'5.7*<span class="exception">some text</span>random text to keep' ,
'4.9*<span class="exception">some text</span>8.0'];
foreach($value as $k=>$v){
    $value[$k] = strip_tags($v);
}
print_r($value);

PHP Regex模式，从字符串

5 个答案:

测试

Demo