使用javascript或其他语言分割标记

时间:2016-11-30 14:30:30

标签: javascript php

我有一堆标签,我需要分析我的论文。由于工作量很大,我想知道是否有可能实现流程自动化。

我想用括号之间显示的数字来区分每个标记。此数字显示该标签的频率,但在手动拆分标签时,这很有帮助。

任何人都可以帮助我使用可以将标记拆分为新行的JavaScript或PHP脚本吗?

这是标签的示例:

1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)

3 个答案:

答案 0 :(得分:3)

这个JavaScript应该可以解决这个问题:



var str = "1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)"

var tags = str.split(/\) (?=\w)/g)     // Split that list of tags at every ") " that is followed by a letter.
  .map(function(pair){                 // (So don't split "1900's (late) (1)")
    var values = pair.split(' (');

    var result = {
        amount: parseInt(values.pop()) // The last value in `values` is the count
    }
    result.tag = values.join(' (');    // Restore the remaining string.

    return result;
  });

console.log(tags);




答案 1 :(得分:2)

你可以使用regex替换..

更新:刚刚在regEx上添加了一个无捕获组以消除空白区域。(?:\s)部分执行此操作.. (\d*\)找到(<number>)之类的内容。哦,然后使用$&重新插入捕获的(\d*\)。万一有人想知道这一切是如何运作的。

&#13;
&#13;
var input = "1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)";

console.log(input.replace(/\(\d*\)(?:\s)/g,'$&\r\n'));
&#13;
&#13;
&#13;

答案 2 :(得分:0)

这是一个PHP解决方案:

$t = array();
$tags = "your tags here...";
$tags = explode(")",$tags);
foreach($tags as $tagJunk){
    $tagJunk = explode("(", $tagJunk);
    $t[] = array("tag"=>trim($tagJunk[0]), "count"=>trim($tagJunk[1]));
}

这是一个小提琴:https://3v4l.org/U2j0k