比较两个字符串,突出显示PHP中的重复单词

时间:2016-09-28 08:26:24

标签: php duplicates

见图:

enter image description here

我真的很想知道在重复单词上比较两个字符串(长文本文件)的最佳方法是什么,然后我需要在第二个字符串中突出显示它们。就像copyscape一样。它用于我们内部的内容数据库。

我错过了一个简单的PHP函数吗?有人能指出我正确的方向吗?

我所知道的是制作两个数组并将它们与foreach循环进行比较。但它没有意义,我的脚本在没有突出显示的情况下获得40行

2 个答案:

答案 0 :(得分:0)

我认为https://github.com/gorhill/PHP-FineDiff可能会起作用。它会根据需要比较不同粒度的文本,甚至是字符级别。

如果通过添加

以相同的顺序显示 ,您实际上可以找到相同的重复短语
static $commons;

public static function renderCommonsFromOpcodes($from, $opcodes)
{
    FineDiff::renderFromOpcodes($from, $opcodes, array('FineDiff', 'renderCommonsFromOpcode'));
}

private static function renderCommonsFromOpcode($opcode, $from, $from_offset, $from_len)
{
    if ($opcode === 'c') {
        self::$commons[] = substr($from, $from_offset, $from_len);
    }
}

到finediff.php中的FineDiff :: class。

用法:

include 'finediff.php';

$from_text = "PHP FPM is a popular general-purpose scripting language that is especially suited to web development.";
$to_text = "Fast, flexible and pragmatic, PHP FPM powers everything from your blog to the most popular websites in the world";

$opcodes = FineDiff::getDiffOpcodes($from_text, $to_text, FineDiff::wordDelimiters);

FineDiff::renderCommonsFromOpcodes($from_text, $opcodes);

print_r(FineDiff::$commons);

/*
Array
(
    [0] => PHP FPM
    [1] => popular
)
*/

答案 1 :(得分:0)

你可以使用的一种方法是使用array_intersect,其中两个数组是从你想要比较的两个字符串生成的,然后使用字符串替换函数来突出显示常用字。

$str1='PHP is a popular general-purpose scripting language that is especially suited to web development.';
$str2='Fast, flexible and pragmatic, PHP powers everything from your blog to the most popular websites in the world.';

$a1=explode(' ',$str1);
$a2=explode(' ',$str2);

function longenough($word){
    return strlen( $word ) > 3;
}

$a1=array_filter($a1,'longenough');
$a2=array_filter($a2,'longenough');

$common=array_intersect( $a1, $a2 );

foreach( $common as $word ){
    $str2=preg_replace( "@($word)@i",'<span style="color:red">$1</span>', $str2 );
}

echo $str2;