Question

我的代码正在接收一个我无法控制的字符串，我称之为$ my_string。字符串是成绩单的内容。如果我回复字符串，就像这样：

echo $my_string;

我可以看到内容，看起来像这样。

1
00:00:00.000 --> 00:00:04.980
[MUSIC]

2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.

3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.

我想做的是通过一个函数来运行它，这只是所说的实际单词，删除所有表示时间的行或顺序。

[MUSIC] 
Hi, my name is holl and I am here 
to write some php.  
You can see my screen, here.

我的想法是通过中断爆炸整个字符串，并尝试检测哪些行为空或以数字开头，如此...

      $lines = explode("\n", $my_string);
      foreach ($lines as $line) {
        if (is_numeric(line[0]) || empty(line[0]) ) {
          continue;
        }
        $exclude[] = $line;
      }
      $transcript = implode("\n", $exclude);

但是这个动作的结果完全相同 - 输出有数字和空白行。我显然误解了一些东西 - 但它是什么？是否有更好的方法来实现我的目标？

谢谢！

编辑：删除了我在代码中实际上没有使用回声的回声。

Answer 1

问题是你在$ line上使用索引：

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if (is_numeric(line[0]) || empty(line[0]) ) {//index usage?
        continue;
    }
    $exclude[] = $line;
}
$transcript = echo implode("\n", $exclude); //remove echo

替换为：

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if (is_numeric($line) || empty($line) ) {//here
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);

您还需要正则表达式匹配才能删除00:00:00.000 --> 00:00:04.980个片段。

您可以通过以下方式将它们组合在一起：

if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) { //regex

考虑所有可能性：

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;

示例（使用php -a）：

$ php -a
php > $my_string='1
php ' 00:00:00.000 --> 00:00:04.980
php ' [MUSIC]
php ' 
php ' 2
php ' 00:00:04.980 --> 00:00:08.120
php ' Hi, my name is holl and I am here
php ' to write some PHP.
php ' 
php ' 3
php ' 00:00:08.120 --> 00:00:10.277
php ' You can see my screen, here.';
php > $lines = explode("\n", $my_string);
php > foreach ($lines as $line) {
php {     if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
php {         continue;
php {     }
php {     $exclude[] = $line;
php { }
php > $transcript = implode("\n", $exclude);
php > echo $transcript;
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.

Answer 2

看起来这是一种模式。这是第一行和第二行包含元数据，第三行是文本，第四行是空的。如果情况确实如此，那应该是微不足道的。您根本不需要检查内容，只需抓住每个四重奏的第三行：

$lines = explode("\n", $my_string);
$texts = array();
for ($i = 0; $i < count($lines); $i++) {
  if ($i % 4 == 2) { // Index of third line is 2, of course.
    $texts[] = $lines[i];
  }
}

$transcript = implode($texts, "\n");

使用替代逻辑，因为正如您正确提到的那样，可以有多行文本，您可以说无论您调用它们的块/条目，都用空行分隔。每个块以两行元数据开头，后跟一行（或可能为零）或更多行文本。有了这个逻辑，你可以像这样解析它：

$lines = explode("\n", $my_string);
$texts = array();
$linenr = 0;
foreach ($lines as $line) {
  // Keep track of the how manieth non-empty line it is.
  if ($line === '')
    $linenr = 0;
  else
    $linenr++;

  // Skip the first two lines of a block. 
  if ($linenr > 2)
    $texts[] = $line;
}

$transcript = implode($texts, "\n");

我不知道这种特殊格式，但如果我想这样做，我会急于找到这样的模式，而不是解析这些行本身。它看起来像一个脚本或字幕文件，如果你想把它变成一个成绩单，如果有人喊“300”并且它不会被转录，那将是一种耻辱。

Answer 3

您的代码几乎可以使用。只是忘记了行[0]中的$和“”不是空的（）。

$my_string = <<< EOF
1
00:00:00.000 --> 00:00:04.980
[MUSIC]

2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.

3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
EOF;

$lines = explode("\n", $my_string);
foreach ($lines as $line) {
    $temp = trim($line[0]);
    if (is_numeric($temp) || empty($temp) ) {
        continue;
    }
    $exclude[] = $line;
}
$transcript = implode("\n", $exclude);

echo $transcript;

结果：

[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.

Answer 4

删除这些行尝试使用：preg_replace + regex

php man [1]：http://php.net/manual/en/function.preg-replace.php

如何删除多行字符串中的某些行？

4 个答案: