Question

我在使用php删除文本文件中的部分字符串时遇到了一些麻烦。

我有一个大文件，我需要删除此文件的一部分。

事情是线路并不总是一样的。它保持格式，但数字会改变。这是一个例子：

< /td >This is the line< /td >and this< /td >is < /td >the < /td >part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >

我想从＆lt;中删除/ td＆gt;在这个之后直到＆lt; / td＆gt;在姓名之后。

我想知道是否还有makin php从名称向后删除，直到X号出现在＆lt; / td＆gt;，类似于：

从姓名中删除，直到第四次出现＆lt; / td＆gt;

希望有人可以帮助我....

下面的两个答案都为文本做了诀窍，但它们对我的真实代码不起作用。所以这是真正的代码的一部分：

...＆lt; / td＆gt;＆lt; / tr＆gt;＆lt; tr＆gt;＆lt; td onmouseover =“dm.v（this，1）;”的onmouseout = “dm.u（本）;” id =“mnFE0BBC45_i8”onclick =“dm.ItClk（this，\'\'）; cmn.href（\'indexall.php \'，\'\'）;” class =“mn31BBMainMenuItemTD”＆gt;＆lt; table border =“0”cellspacing =“0”cellpadding =“0”＆gt;＆lt; tr＆gt;＆lt; td class =“mn31BBIconTD”＆gt; ＆LT; font class =“MG_Icons”＆gt; ＆amp; #xe 746;＆lt; / font＆gt;＆lt; / td＆gt;＆lt; td class =“mn31BBTitleTD”id =“mnFE0BBC45_i8-tl”＆gt; Other_Name＆lt; / td＆gt;＆lt; td class =“mn31BBArrowTD”＆gt; ＆LT; / td＆gt;＆lt; / tr＆gt;＆lt; / table＆gt;＆lt; / td＆gt;＆lt; / tr＆gt;＆lt; tr＆gt;＆lt; td onmouseover =“dm.v（this，1）;”的onmouseout = “dm.u（本）;” id =“mnFE0BBC45_i3”onclick =“dm.ItClk（this，\'\'）; cmn.href（\'index.php \'，\'\'）;” class =“mn31BBMainMenuItemTD”＆gt;＆lt; table border =“0”cellspacing =“0”cellpadding =“0”＆gt;＆lt; tr＆gt;＆lt; td class =“mn31BBIconTD”＆gt;＆lt; font class =“MG_Icons”＆gt;＆amp; #xe 746;＆lt; / font＆gt;＆lt; / td＆gt;＆lt; td class =“mn31BBTitleTD”id =“mnFE0BBC45_i3-tl”＆gt; 名称＆lt; / td＆gt; class =“mn31BBArrowTD”＆lt; / td＆gt; / tr / table＆lt; / td＆gt;＆lt; / tr＆gt;＆lt; tr＆gt;＆lt; onmouseover =“dm.v（th is，1）;” onmouseout =“dm.u（是）;” id =“mnFE0B BC45_i5”oncli ck =“dm.ItC lk（t his，\'\'）; cmn.h ref（\'indexd2.php \'，\'\'）;”类...

这只是代码的一小部分（是一个Javascript菜单），所有标签（＆lt; tr＆gt;）中都有空格可以看到它们....

我想删除的文字是：

＆LT; / td＆gt;＆lt; td class =“mn31BBArrowTD”＆gt; ＆LT; / td＆gt;＆lt; / tr＆gt;＆lt; / table＆gt;＆lt; / td＆gt;＆lt; / tr＆gt;＆lt; tr＆gt;＆lt; td onmouseover =“dm.v（this，1）;”的onmouseout = “dm.u（本）;” id =“mnFE0BBC45_i3”onclick =“dm.ItClk（this，\'\'）; cmn.href（\'index.php \'，\'\'）;” class =“mn31BBMainMenuItemTD”＆gt;＆lt; table border =“0”cellspacing =“0”cellpadding =“0”＆gt;＆lt; tr＆gt;＆lt; td class =“mn31BBIconTD”＆gt;＆lt; font class =“MG_Icons”＆gt;＆amp; #xe 746;＆lt; / font＆gt;＆lt; / td＆gt;＆lt; td class =“mn31BBTitleTD”id =“mnFE0BBC45_i3-tl”＆gt; 姓名

mnFE0BBC45_i3-tl和mnFE0BBC45_i3并不总是相同，数字会根据名称而变化。

这就是我想要的方法：将名称中的所有内容全部删除至＆lt;的第四个外观。 / td＆gt;

Answer 1

试试这个：

ALGO： 1）名字的第一个位置; 2）从最后找到第3个td的位置 3）然后从这两个位置截断或制作子串。

$text_string= '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';
$textLength = strlen($text_string);
$first_pos= strpos($text_string,'Name');
$third_occurance = strrpos($text_string, '< /td >', $first_pos- strlen($text_string) - 3);
$result = substr_replace($text_string, ' ', $third_occurance /2, $textLength-$third_occurance );
var_DUMP($result);

输出：

string(78) "< /td >This is the line< /td >and this keeps going < /td > a loong way < /td >"

Answer 2

首先误读要求;这是一个更正版本，在“名称”之前查找适当的匹配。

在“＆lt; \ td＆gt;”的其他出现之间我只是在寻找字母数字字符和空格。可能需要向此字符类添加更多内容，例如破折号或下划线（[[：alnum：] \] +）

<?php
$txt = '< /td >This is the line< /td >and this< /td >is the part< /td >want to remove< /td >Name< /td > after it keeps going < /td > a loong way < /td >';

$replacement = preg_replace('/([[:alnum:]\ ]+<\s*\/td\s*>){2,2}Name<\s*\/td\s*>/', '', $txt);
echo "$replacement \n";
?>

输出：

< /td >This is the line< /td >and this< /td > after it keeps going < /td > a loong way < /td >

编辑：

这是一个小的Perl脚本，可以满足您的需求：

#!/usr/bin/perl
#

use strict;
use warnings;

open(my $fh, "<", "input.txt")
                   or die "cannot open < input.txt: $!";
my $content = do { local $/ = <$fh> };
close($fh);

my $anchor = ">Name<";
my $position = 0;
# find occurences of anchor in the text
while ( $position = index($content, $anchor, $position) ) {
    if ($position == -1) {
        last;
    }
    print "anchor $anchor is at $position \n";
    # go backwards to the starttag of the anchor (has to be a td element)
    my $starttag_position = rindex($content, "< td", $position);
    print "starttag of anchor is at $starttag_position \n";
    my $start = $starttag_position;
    # look backwards to closing tds
    for (my $i = 0; $i < 4; $i++) {
        $start = rindex($content, "< /td >", $start - 1);
        if ($start == -1) {
            die("less than 3 tds found before $anchor");
        }
    }
    print "first td is at $start \n";
    # delete the text in between
    substr($content, $start, $starttag_position - $start, "");
}

open(my $fout, ">", "input.new")
                   or die "cannot open > input.new: $!";
print $fout $content;
close $fout;

PHP使用变量从较大的字符串中删除字符串

2 个答案: