从长文本中选择单词和计数(PHP)

时间:2016-05-05 18:48:05

标签: php regex select count

我需要获取主题标签并从长文本中计算它们。我知道我可以用正则表达式做到这一点,但我做不到。如果你能帮助我,我会很感激。这是我的示例文本;

  

#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog这就是世界,毕竟巴黎是一场无休止的对比回忆。我现在可以清楚地看到雨已经消失了。 #music我可以看到所有障碍。 #paris #queenstreet #foreveronvocationNever感觉更加迷人。 #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice和我的好朋友Danny Marin将为我们的听觉探索dj。 #stack #over #flow是或不是#poem #music #paris

我需要获取像“#paris”这样的主题标签并计算每个主题标签,最后通过迭代对主题标签进行排序。 e.g。

  

#paris(6)
   #music(3)
   #...(2)
   #...(2)
   #(1)
   #(1)
   #...(1)

4 个答案:

答案 0 :(得分:0)

将字符串拆分为'#'

上的数组

将该数组的每个元素拆分为''仅保留第一个单词

获取每个令牌的数量并以并行数组存储

使用并行数组排序

答案 1 :(得分:0)

 preg_match_all("/(\#\w+)/", $string, $array);
$array = array_count_values($array[1]);
asort($array);

foreach($array as $key => $value) {
    echo "$key ($value)<br>\n";
}

应该给你你需要的东西

编辑:抱歉忘记了数组的索引

工作示例:
http://sandbox.onlinephpfunctions.com/code/d1fe24cbc8deedd24f7825ea4e48eaa691b8d401

答案 2 :(得分:0)

您可以使用array_count_values,这是一个例子:

<?php

$html = <<< EOF
#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris
EOF;
preg_match_all('/(#.*?\S+)/im', $html, $hTags, PREG_PATTERN_ORDER);
print_r(array_count_values($hTags[1]));

<强>输出:

Array
(
    [#paris] => 5
    [#love] => 1
    [#spring] => 1
    [#outdoor] => 1
    [#life] => 1
    [#istanbul] => 1
    [#par] => 1
    [#sacrecoeur] => 1
    [#france] => 1
    [#latex] => 1
    [#dog] => 1
    [#music] => 3
    [#queenstreet] => 1
    [#foreveronvocationNever] => 1
    [#ski] => 1
    [#skiing] => 1
    [#skier] => 1
    [#terrainpark] => 1
    [#snowboard] => 1
    [#snowboarding] => 1
    [#snowboarder] => 1
    [#longboard] => 1
    [#longboarding] => 1
    [#longboarder] => 1
    [#skateboard] => 1
    [#skateboarder] => 1
    [#skateboarding] => 1
    [#winter] => 1
    [#just] => 1
    [#stack] => 1
    [#over] => 1
    [#flow] => 1
    [#poem] => 1
)

正则表达式说明:

(#.*?\S+)

Match the regex below and capture its match into backreference number 1 «(#.*?\S+)»
   Match the character “#” literally «#»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
   Match a single character that is NOT a “whitespace character” (any Unicode separator, tab, line feed, carriage return, vertical tab, form feed, next line) «\S+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»

Live Demo

答案 3 :(得分:0)

如果您愿意,可以使用PHP来实现:

$tagString = "#paris #love #spring #outdoor #life #istanbul #par #sacrecoeur #paris #france #latex #dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone. #music I can see all obstacles in my way. #paris #queenstreet #foreveronvocationNever felt more glamorous. #ski #music #skiing #skier #terrainpark #paris #snowboard #snowboarding #snowboarder #longboard #longboarding #longboarder #skateboard #skateboarder #skateboarding #winter #just my voice and my good friend Danny Marin will dj for our auditory exploration. #stack #over #flow to be or not to be #poem #music #paris";

$countArray = array();

foreach (explode("#", trim($tagString, '#')) as $tag) {

    $tag = trim($tag);

    if (array_key_exists($tag, $countArray)) {

        $countArray[$tag] = (int) $countArray[$tag] + 1;

    } else {

        $countArray[$tag] = 1;
    }
}

arsort($countArray);

var_dump($countArray);

给出:

array(34) {
  ["paris"]=>
  int(5)
  ["music"]=>
  int(2)
  ["skateboard"]=>
  int(1)
  ["snowboarding"]=>
  int(1)
  ["snowboarder"]=>
  int(1)
  ["longboard"]=>
  int(1)
  ["longboarding"]=>
  int(1)
  ["longboarder"]=>
  int(1)
  ["skateboarder"]=>
  int(1)
  ["terrainpark"]=>
  int(1)
  ["skateboarding"]=>
  int(1)
  ["winter"]=>
  int(1)
  ["just my voice and my good friend Danny Marin will dj for our auditory exploration."]=>
  int(1)
  ["stack"]=>
  int(1)
  ["over"]=>
  int(1)
  ["flow to be or not to be"]=>
  int(1)
  ["snowboard"]=>
  int(1)
  ["skier"]=>
  int(1)
  ["love"]=>
  int(1)
  ["skiing"]=>
  int(1)
  ["ski"]=>
  int(1)
  ["foreveronvocationNever felt more glamorous."]=>
  int(1)
  ["queenstreet"]=>
  int(1)
  ["music I can see all obstacles in my way."]=>
  int(1)
  ["dog Thats what the world is, paris after all, an endless battle of contrasting memories. I can see clearly now the rain is gone."]=>
  int(1)
  ["latex"]=>
  int(1)
  ["france"]=>
  int(1)
  ["sacrecoeur"]=>
  int(1)
  ["par"]=>
  int(1)
  ["istanbul"]=>
  int(1)
  ["life"]=>
  int(1)
  ["outdoor"]=>
  int(1)
  ["spring"]=>
  int(1)
  ["poem"]=>
  int(1)
}

您可以在此进行在线测试:http://sandbox.onlinephpfunctions.com/code/3058b887590845e33685b25e14e21df9959e94e7