Question

我有一个相当大的txt文件（3.5 MB），结构如下：

sweep#1 expanse#1   0.375
loftiness#1 highness#2  0.375
lockstep#1  0.25
laziness#2  0.25
treponema#1 0.25
rhizopodan#1 rhizopod#1 0.25
plumy#3 feathery#3 feathered#1  -0.125
ruffled#2 frilly#1 frilled#1    -0.125
fringed#2   -0.125
inflamed#3  -0.125
inlaid#1    -0.125

每个单词后跟一个#，一个整数，然后是“得分”。单词和分数之间有标签符。截至目前，文本文件使用file_get_contents()加载为字符串。

来自字符串数组由单个小写字符剥离的单词组成，我需要查找每个值，找到相应的分数并将其添加到运行中总

我想我需要某种形式的正则表达式来首先找到该单词，继续下一个\t和然后将整数添加到运行总计中。什么是最好的解决方法？

Answer 1

是的，可能有更好的方法来做到这一点。但这太简单了：

<?php

$wordlist = file_get_contents("wordlist.txt");

//string string of invalid chars and make it lowercase
$string = "This is the best sentence ever! Winning!";
$string = strtolower($string);
$string = preg_replace('/[^\w\d_ -]/si', '', $string);
$words = explode(" ", $string);

$lines = explode("\n", $wordlist);
$scores = array();
foreach ($lines as $line) {
    $split = preg_split("/(\#|\t)/", $line); //split on # or tab
    $scores[$split[0]] = doubleval(array_pop($split));
    //split[0] (first element) contains the word
    //array_pop (last element) contains score
}

$total = 0;
foreach($words as $word) {
    if (isset($scores[$word])) $total += $scores[$word];
}

echo $total;
?>

Answer 2

如果您只是需要找一个单词，那么它就像：

一样简单

preg_match("/^$word#\d+\t+(\d+\.\d+)/m", $textfile, $match);
$sum += floatval($match[1]);

^在/m模式下查找行的开头，#和\t是文字分隔符，而\d+匹配小数。结果组[1]将是您的浮点数。

$word需要转义（preg_quote）它可能包含/正斜杠本身。要一次搜索多个单词，请将它们作为替代列表$word1|$word2|$word3内爆，添加捕获组，然后使用preg_match_all。

从数组中，在PHP中的文本文件中查找值及其对应的键

2 个答案: