我需要获取主题标签并从长文本中计算它们。我知道我可以用正则表达式做到这一点,但我做不到。如果你能帮助我,我会很感激。这是我的示例文本;
#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog这就是世界,毕竟巴黎是一场无休止的对比回忆。我现在可以清楚地看到雨已经消失了。 #music我可以看到所有障碍。 #paris #queenstreet #foreveronvocationNever感觉更加迷人。 #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice和我的好朋友Danny Marin将为我们的听觉探索dj。 #stack #over #flow是或不是#poem #music #paris
我需要获取像“#paris”这样的主题标签并计算每个主题标签,最后通过迭代对主题标签进行排序。 e.g。
#paris(6)
#music(3)
#...(2)
#...(2)
#(1)
#(1)
#...(1)
答案 0 :(得分:0)
将字符串拆分为'#'
上的数组将该数组的每个元素拆分为''仅保留第一个单词
获取每个令牌的数量并以并行数组存储
使用并行数组排序
答案 1 :(得分:0)
preg_match_all("/(\#\w+)/", $string, $array);
$array = array_count_values($array[1]);
asort($array);
foreach($array as $key => $value) {
echo "$key ($value)<br>\n";
}
应该给你你需要的东西
编辑:抱歉忘记了数组的索引
工作示例:
http://sandbox.onlinephpfunctions.com/code/d1fe24cbc8deedd24f7825ea4e48eaa691b8d401
答案 2 :(得分:0)
您可以使用array_count_values,这是一个例子:
<?php
$html = <<< EOF
#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris
EOF;
preg_match_all('/(#.*?\S+)/im', $html, $hTags, PREG_PATTERN_ORDER);
print_r(array_count_values($hTags[1]));
<强>输出:强>
Array
(
[#paris] => 5
[#love] => 1
[#spring] => 1
[#outdoor] => 1
[#life] => 1
[#istanbul] => 1
[#par] => 1
[#sacrecoeur] => 1
[#france] => 1
[#latex] => 1
[#dog] => 1
[#music] => 3
[#queenstreet] => 1
[#foreveronvocationNever] => 1
[#ski] => 1
[#skiing] => 1
[#skier] => 1
[#terrainpark] => 1
[#snowboard] => 1
[#snowboarding] => 1
[#snowboarder] => 1
[#longboard] => 1
[#longboarding] => 1
[#longboarder] => 1
[#skateboard] => 1
[#skateboarder] => 1
[#skateboarding] => 1
[#winter] => 1
[#just] => 1
[#stack] => 1
[#over] => 1
[#flow] => 1
[#poem] => 1
)
正则表达式说明:
(#.*?\S+)
Match the regex below and capture its match into backreference number 1 «(#.*?\S+)»
Match the character “#” literally «#»
Match any single character that is NOT a line break character (line feed) «.*?»
Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\S+»
Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
答案 3 :(得分:0)
如果您愿意,可以使用PHP来实现:
$tagString = "#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris";
$countArray = array();
foreach (explode("#", trim($tagString, '#')) as $tag) {
$tag = trim($tag);
if (array_key_exists($tag, $countArray)) {
$countArray[$tag] = (int) $countArray[$tag] + 1;
} else {
$countArray[$tag] = 1;
}
}
arsort($countArray);
var_dump($countArray);
给出:
array(34) {
["paris"]=>
int(5)
["music"]=>
int(2)
["skateboard"]=>
int(1)
["snowboarding"]=>
int(1)
["snowboarder"]=>
int(1)
["longboard"]=>
int(1)
["longboarding"]=>
int(1)
["longboarder"]=>
int(1)
["skateboarder"]=>
int(1)
["terrainpark"]=>
int(1)
["skateboarding"]=>
int(1)
["winter"]=>
int(1)
["just my voice and my good friend Danny Marin will dj for our auditory exploration."]=>
int(1)
["stack"]=>
int(1)
["over"]=>
int(1)
["flow to be or not to be"]=>
int(1)
["snowboard"]=>
int(1)
["skier"]=>
int(1)
["love"]=>
int(1)
["skiing"]=>
int(1)
["ski"]=>
int(1)
["foreveronvocationNever felt more glamorous."]=>
int(1)
["queenstreet"]=>
int(1)
["music I can see all obstacles in my way."]=>
int(1)
["dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone."]=>
int(1)
["latex"]=>
int(1)
["france"]=>
int(1)
["sacrecoeur"]=>
int(1)
["par"]=>
int(1)
["istanbul"]=>
int(1)
["life"]=>
int(1)
["outdoor"]=>
int(1)
["spring"]=>
int(1)
["poem"]=>
int(1)
}
您可以在此进行在线测试:http://sandbox.onlinephpfunctions.com/code/3058b887590845e33685b25e14e21df9959e94e7