如何删除多行字符串中的某些行?

时间:2014-12-16 22:00:30

标签: php html

我的代码正在接收一个我无法控制的字符串,我称之为$ my_string。字符串是成绩单的内容。如果我回复字符串,就像这样:

echo $my_string;

我可以看到内容,看起来像这样。

1
00:00:00.000 --> 00:00:04.980
[MUSIC]

2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.

3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.

我想做的是通过一个函数来运行它,这只是所说的实际单词,删除所有表示时间的行或顺序。

[MUSIC] 
Hi, my name is holl and I am here 
to write some php.  
You can see my screen, here.

我的想法是通过中断爆炸整个字符串,并尝试检测哪些行为空或以数字开头,如此...

      $lines = explode("\n", $my_string);
      foreach ($lines as $line) {
        if (is_numeric(line[0]) || empty(line[0]) ) {
          continue;
        }
        $exclude[] = $line;
      }
      $transcript = implode("\n", $exclude);

但是这个动作的结果完全相同 - 输出有数字和空白行。我显然误解了一些东西 - 但它是什么?是否有更好的方法来实现我的目标?

谢谢!

编辑:删除了我在代码中实际上没有使用回声的回声。

4 个答案:

答案 0 :(得分:3)

问题是你在$ line上使用索引:

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if (is_numeric(line[0]) || empty(line[0]) ) {//index usage?
        continue;
    }
    $exclude[] = $line;
}
$transcript = echo implode("\n", $exclude); //remove echo

替换为:

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if (is_numeric($line) || empty($line) ) {//here
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);

您还需要正则表达式匹配才能删除00:00:00.000 --> 00:00:04.980个片段。

您可以通过以下方式将它们组合在一起:

if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) { //regex 

考虑所有可能性:

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;

示例(使用php -a):

$ php -a
php > $my_string='1
php ' 00:00:00.000 --> 00:00:04.980
php ' [MUSIC]
php ' 
php ' 2
php ' 00:00:04.980 --> 00:00:08.120
php ' Hi, my name is holl and I am here
php ' to write some PHP.
php ' 
php ' 3
php ' 00:00:08.120 --> 00:00:10.277
php ' You can see my screen, here.';
php > $lines = explode("\n", $my_string);
php > foreach ($lines as $line) {
php {     if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
php {         continue;
php {     }
php {     $exclude[] = $line;
php { }
php > $transcript = implode("\n", $exclude);
php > echo $transcript;
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.

答案 1 :(得分:1)

看起来这是一种模式。这是第一行和第二行包含元数据,第三行是文本,第四行是空的。如果情况确实如此,那应该是微不足道的。您根本不需要检查内容,只需抓住每个四重奏的第三行:

$lines = explode("\n", $my_string);
$texts = array();
for ($i = 0; $i < count($lines); $i++) {
  if ($i % 4 == 2) { // Index of third line is 2, of course.
    $texts[] = $lines[i];
  }
}

$transcript = implode($texts, "\n");

使用替代逻辑,因为正如您正确提到的那样,可以有多行文本,您可以说无论您调用它们的块/条目,都用空行分隔。每个块以两行元数据开头,后跟一行(或可能为零)或更多行文本。有了这个逻辑,你可以像这样解析它:

$lines = explode("\n", $my_string);
$texts = array();
$linenr = 0;
foreach ($lines as $line) {
  // Keep track of the how manieth non-empty line it is.
  if ($line === '')
    $linenr = 0;
  else
    $linenr++;

  // Skip the first two lines of a block. 
  if ($linenr > 2)
    $texts[] = $line;
}

$transcript = implode($texts, "\n");

我不知道这种特殊格式,但如果我想这样做,我会急于找到这样的模式,而不是解析这些行本身。它看起来像一个脚本或字幕文件,如果你想把它变成一个成绩单,如果有人喊“300”并且它不会被转录,那将是一种耻辱。

答案 2 :(得分:1)

您的代码几乎可以使用。只是忘记了行[0]中的$和“”不是空的()。

$my_string = <<< EOF
1
00:00:00.000 --> 00:00:04.980
[MUSIC]

2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.

3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
EOF;

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    $temp = trim($line[0]);
    if (is_numeric($temp) || empty($temp) ) {
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);

echo $transcript;

结果:

[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.

答案 3 :(得分:0)

删除这些行尝试使用:preg_replace + regex

php man [1]:http://php.net/manual/en/function.preg-replace.php