Question

我有一个很大的.txt文件，其中包含很多这样的文字：

data-domain="googledotcom"

所以，我想将引号中的任何内容（在本例中为googledotcom）提取到一个新文件中。应使用新行（或至少使用制表符）将结果分开。

我在网上找了一个简单的方法。我可能错误地标记了这个问题，只是因为我不确定如何实现这一点，谢谢你的帮助。

Answer 1

$text = file('file.txt') ;
foreach ($text as $value) {
    if (preg_match('/"([^"]+)"/', $value, $match)) {
        $domains[] = $match[1];
    }
}
file_put_contents("domains.txt", implode("\n", $domains));

Answer 2

如评论中所述，您可以使用preg_match_all()正则表达式：

<?php
header('Content-Type: text/plain; charset=utf-8');

$test = <<<STR
xxx
data-domain="test1"
yyy data-domain="test2"
zzz
data-domain="test3"
STR;

$results = preg_match_all('/data\-domain\=\"(.+)\"/', $test, $matches);

print_r($matches[1]);
?>

结果：

Array
(
    [0] => test1
    [1] => test2
    [2] => test3
)

依赖于filesize，您应该按fopen() + fread()（逐行排列）或file_get_contents()（整个文件，如果是相对较小）。然后使用正则表达式对其进行分析，并将结果写入新文件。

从.txt文件中提取具有特定模式的数据

2 个答案: