我的代码正在接收一个我无法控制的字符串,我称之为$ my_string。字符串是成绩单的内容。如果我回复字符串,就像这样:
echo $my_string;
我可以看到内容,看起来像这样。
1
00:00:00.000 --> 00:00:04.980
[MUSIC]
2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.
3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
我想做的是通过一个函数来运行它,这只是所说的实际单词,删除所有表示时间的行或顺序。
[MUSIC]
Hi, my name is holl and I am here
to write some php.
You can see my screen, here.
我的想法是通过中断爆炸整个字符串,并尝试检测哪些行为空或以数字开头,如此...
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric(line[0]) || empty(line[0]) ) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
但是这个动作的结果完全相同 - 输出有数字和空白行。我显然误解了一些东西 - 但它是什么?是否有更好的方法来实现我的目标?
谢谢!
编辑:删除了我在代码中实际上没有使用回声的回声。
答案 0 :(得分:3)
问题是你在$ line上使用索引:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric(line[0]) || empty(line[0]) ) {//index usage?
continue;
}
$exclude[] = $line;
}
$transcript = echo implode("\n", $exclude); //remove echo
替换为:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if (is_numeric($line) || empty($line) ) {//here
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
您还需要正则表达式匹配才能删除00:00:00.000 --> 00:00:04.980
个片段。
您可以通过以下方式将它们组合在一起:
if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) { //regex
考虑所有可能性:
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;
示例(使用php -a
):
$ php -a
php > $my_string='1
php ' 00:00:00.000 --> 00:00:04.980
php ' [MUSIC]
php '
php ' 2
php ' 00:00:04.980 --> 00:00:08.120
php ' Hi, my name is holl and I am here
php ' to write some PHP.
php '
php ' 3
php ' 00:00:08.120 --> 00:00:10.277
php ' You can see my screen, here.';
php > $lines = explode("\n", $my_string);
php > foreach ($lines as $line) {
php { if(preg_match('/^(|\d+|\d+:\d+:\d+\.\d+\s+-->\s+\d+:\d+:\d+\.\d+)$/',$line)) {
php { continue;
php { }
php { $exclude[] = $line;
php { }
php > $transcript = implode("\n", $exclude);
php > echo $transcript;
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.
答案 1 :(得分:1)
看起来这是一种模式。这是第一行和第二行包含元数据,第三行是文本,第四行是空的。如果情况确实如此,那应该是微不足道的。您根本不需要检查内容,只需抓住每个四重奏的第三行:
$lines = explode("\n", $my_string);
$texts = array();
for ($i = 0; $i < count($lines); $i++) {
if ($i % 4 == 2) { // Index of third line is 2, of course.
$texts[] = $lines[i];
}
}
$transcript = implode($texts, "\n");
使用替代逻辑,因为正如您正确提到的那样,可以有多行文本,您可以说无论您调用它们的块/条目,都用空行分隔。每个块以两行元数据开头,后跟一行(或可能为零)或更多行文本。有了这个逻辑,你可以像这样解析它:
$lines = explode("\n", $my_string);
$texts = array();
$linenr = 0;
foreach ($lines as $line) {
// Keep track of the how manieth non-empty line it is.
if ($line === '')
$linenr = 0;
else
$linenr++;
// Skip the first two lines of a block.
if ($linenr > 2)
$texts[] = $line;
}
$transcript = implode($texts, "\n");
我不知道这种特殊格式,但如果我想这样做,我会急于找到这样的模式,而不是解析这些行本身。它看起来像一个脚本或字幕文件,如果你想把它变成一个成绩单,如果有人喊“300”并且它不会被转录,那将是一种耻辱。
答案 2 :(得分:1)
您的代码几乎可以使用。只是忘记了行[0]中的$和“”不是空的()。
$my_string = <<< EOF
1
00:00:00.000 --> 00:00:04.980
[MUSIC]
2
00:00:04.980 --> 00:00:08.120
Hi, my name is holl and I am here
to write some PHP.
3
00:00:08.120 --> 00:00:10.277
You can see my screen, here.
EOF;
$lines = explode("\n", $my_string);
foreach ($lines as $line) {
$temp = trim($line[0]);
if (is_numeric($temp) || empty($temp) ) {
continue;
}
$exclude[] = $line;
}
$transcript = implode("\n", $exclude);
echo $transcript;
结果:
[MUSIC]
Hi, my name is holl and I am here
to write some PHP.
You can see my screen, here.
答案 3 :(得分:0)
删除这些行尝试使用:preg_replace + regex
php man [1]:http://php.net/manual/en/function.preg-replace.php