获取字符串 - 查找所有出现的PHP

时间:2014-11-22 14:06:26

标签: php

I found this function在两个文本字符串,html或其他内容之间查找数据。

如何更改以便找到所有事件?每次出现$ start [some-random-data] $ end之间的每个数据。我想要文件的所有[某些随机数据](它总是不同的数据)。

function getStringBetween($string, $start, $end) {
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

6 个答案:

答案 0 :(得分:30)

一种可能的方法:

function getContents($str, $startDelimiter, $endDelimiter) {
  $contents = array();
  $startDelimiterLength = strlen($startDelimiter);
  $endDelimiterLength = strlen($endDelimiter);
  $startFrom = $contentStart = $contentEnd = 0;
  while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
    $contentStart += $startDelimiterLength;
    $contentEnd = strpos($str, $endDelimiter, $contentStart);
    if (false === $contentEnd) {
      break;
    }
    $contents[] = substr($str, $contentStart, $contentEnd - $contentStart);
    $startFrom = $contentEnd + $endDelimiterLength;
  }

  return $contents;
}

用法:

$sample = '<start>One<end>aaa<start>TwoTwo<end>Three<start>Four<end><start>Five<end>';
print_r( getContents($sample, '<start>', '<end>') );
/*
Array
(
    [0] => One
    [1] => TwoTwo
    [2] => Four
    [3] => Five
)
*/ 

Demo

答案 1 :(得分:7)

您可以使用正则表达式执行此操作:

function getStringsBetween($string, $start, $end)
{
    $pattern = sprintf(
        '/%s(.*?)%s/',
        preg_quote($start),
        preg_quote($end)
    );
    preg_match_all($pattern, $string, $matches);

    return $matches[1];
}

答案 2 :(得分:2)

我喜欢使用explode来获取两个字符串之间的字符串。此功能也适用于多次出现。

function GetIn($str,$start,$end){
    $p1 = explode($start,$str);
    for($i=1;$i<count($p1);$i++){
        $p2 = explode($end,$p1[$i]);
        $p[] = $p2[0];
    }
    return $p;
}

答案 3 :(得分:1)

我需要在特定的第一个和最后一个标记之间找到所有这些出现并以某种方式更改它们并返回更改的字符串。

所以我在函数之后将这个小代码添加到raina77ow方法中。

        $sample = '<start>One<end> aaa <start>TwoTwo<end> Three <start>Four<end> aaaaa <start>Five<end>';
        $sample_temp = getContents($sample, '<start>', '<end>');
        $i = 1;
        foreach($sample_temp as $value) {
            $value2 = $value.'-'.$i; //there you can change the variable
            $sample=str_replace('<start>'.$value.'<end>',$value2,$sample);
            $i = ++$i;
        }
        echo $sample;

现在输出样本已删除标记,它们之间的所有字符串都添加了这样的数字:

One-1 aaa TwoTwo-2 Three Four-3 aaaaa Five-4

但你可以用它们做任何其他事情。也许对某人有帮助。

答案 4 :(得分:0)

我还需要模式之外的文本。所以我将raina77ow的答案改为:

function get_delimited_strings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, $contentStart, $contentEnd - $contentStart);
        $contents['out'][] = substr($str, $outStart, $outEnd - $outStart);
        $startFrom = $contentEnd + $endDelimiterLength;
        $outStart = $startFrom;
    }
    $contents['out'][] = substr($str, $outStart, $contentEnd - $outStart);
    return $contents;
}

用法:

    $str = "Bore layer thickness [2 mm] instead of [1,25 mm] with [0,1 mm] deviation.";
    $cas = get_delimited_strings($str, "[", "]");

给出:

array(2) { 
    ["in"]=> array(3) { 
        [0]=> string(4) "2 mm" 
        [1]=> string(7) "1,25 mm" 
        [2]=> string(6) "0,1 mm" 
    } 
    ["out"]=> array(4) { 
        [0]=> string(21) "Bore layer thickness " 
        [1]=> string(12) " instead of " 
        [2]=> string(6) " with " 
        [3]=> string(10) " deviation" 
    } 
}

答案 5 :(得分:0)

这里有一些很好的解决方案,但是现在还不适合从说HTML中提取部分代码,这是我现在的问题,因为在压缩HTML之前我需要从HTML中取出脚本块。因此,以@ raina77ow原始解决方案为基础,由@Cas Tuyn扩展,我得到了这个:

$test_strings = [
    '0<p>a</p>1<p>b</p>2<p>c</p>3',
    '0<p>a</p>1<p>b</p>2<p>c</p>',
    '<p>a</p>1<p>b</p>2<p>c</p>3',
    '<p>a</p>1<p>b</p>2<p>c</p>',
    '<p></p>1<p>b'
];

/**
* Seperate a block of code by sub blocks. Example, removing all <script>...<script> tags from HTML kode
* 
* @param string $str, text block
* @param string $startDelimiter, string to match for start of block to be extracted
* @param string $endDelimiter, string to match for ending the block to be extracted
* @return array [all full blocks, whats left of string]
*/
function getDelimitedStrings($str, $startDelimiter, $endDelimiter) {
    $contents = array();
    $startDelimiterLength = strlen($startDelimiter);
    $endDelimiterLength = strlen($endDelimiter);
    $startFrom = $contentStart = $contentEnd = $outStart = $outEnd = 0;
    while (false !== ($contentStart = strpos($str, $startDelimiter, $startFrom))) {
        $contentStart += $startDelimiterLength;
        $contentEnd = strpos($str, $endDelimiter, $contentStart);
        $outEnd = $contentStart - 1;
        if (false === $contentEnd) {
            break;
        }
        $contents['in'][] = substr($str, ($contentStart-$startDelimiterLength), ($contentEnd + ($startDelimiterLength*2) +1) - $contentStart);
        if( $outStart ){
            $contents['out'][] = substr($str, ($outStart+$startDelimiterLength+1), $outEnd - $outStart - ($startDelimiterLength*2));
        } else if( ($outEnd - $outStart - ($startDelimiterLength-1)) > 0 ){
            $contents['out'][] = substr($str, $outStart, $outEnd - $outStart - ($startDelimiterLength-1));
        }
        $startFrom = $contentEnd + $endDelimiterLength;
        $startFrom = $contentEnd;
        $outStart = $startFrom;
    }
    $total_length = strlen($str);
    $current_position = $outStart + $startDelimiterLength + 1;
    if( $current_position < $total_length )
        $contents['out'][] = substr($str, $current_position);

    return $contents;
}

foreach($test_strings AS $string){
    var_dump( getDelimitedStrings($string, '<p>', '</p>') );
}

这还将提取所有

元素以及可能的innerHTML,得到以下结果:

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=4)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)
    3 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '0' (length=1)
    1 => string '1' (length=1)
    2 => string '2' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=3)
    0 => string '1' (length=1)
    1 => string '2' (length=1)
    2 => string '3' (length=1)

array (size=2)
'in' => array (size=3)
    0 => string '<p>a</p>' (length=8)
    1 => string '<p>b</p>' (length=8)
    2 => string '<p>c</p>' (length=8)
'out' => array (size=2)
    0 => string '1' (length=1)
    1 => string '2' (length=1)

array (size=2)
'in' => array (size=1)
    0 => string '<p></p>' (length=7)
'out' => array (size=1)
    0 => string '1<p>b' (length=5)

您可以在此处查看演示:3v4l.org/TQLmn