我有一堆标签,我需要分析我的论文。由于工作量很大,我想知道是否有可能实现流程自动化。
我想用括号之间显示的数字来区分每个标记。此数字显示该标签的频率,但在手动拆分标签时,这很有帮助。
任何人都可以帮助我使用可以将标记拆分为新行的JavaScript或PHP脚本吗?
这是标签的示例:
1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)
答案 0 :(得分:3)
这个JavaScript应该可以解决这个问题:
var str = "1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)"
var tags = str.split(/\) (?=\w)/g) // Split that list of tags at every ") " that is followed by a letter.
.map(function(pair){ // (So don't split "1900's (late) (1)")
var values = pair.split(' (');
var result = {
amount: parseInt(values.pop()) // The last value in `values` is the count
}
result.tag = values.join(' ('); // Restore the remaining string.
return result;
});
console.log(tags);

答案 1 :(得分:2)
你可以使用regex替换..
更新:刚刚在regEx上添加了一个无捕获组以消除空白区域。(?:\s)
部分执行此操作.. (\d*\)
找到(<number>)
之类的内容。哦,然后使用$&
重新插入捕获的(\d*\)
。万一有人想知道这一切是如何运作的。
var input = "1001-import (1) 1001-must-read-2008-edition (1) 1001-must-read-books (2) 1001-must-reads (1) 1001-read (1) 1010 Challenge (1) 10B (1) 10th (1) 11 in 11 (1) 11 in 11 - Read (1) 11 in 11 - Travel (1) 11-22-2011 (1) 11-22-2011take2 (1) 111 Science Fiction (1) 11JAN10 (1) 11th (1) 11th century (1) 12 in 12 (1) 12 år (1) 12/12/13 Tyler Deal - letter sent (1) 12/20/2011 (1) 12th (1) 13 (1) 13 år (1) 131 (1) 14 år (1) 14-15 (1) 15 år (1) 16 år (1) 1659 (1) 168 sidor (1) 17 år (1) 18S (1) 18x11 (1) 1900's (late) (1) 1945-1999 (1) 1950-1999 (1) 1960s-'70s novels (1) 1970 (1) 1970'erne (1) 1970s (36) 1970s authorship (1) 1970s fiction (1) 1979 (27) 1979 pub (1) 1979AD (1) 1980 (2) 1980s (7) 1981 (3) 1981/02 (1) 1982 (3) 1983 (1) 1984 (1) 1986 (1) 1988 (2) 1988-1991 (1) 1989 (1) 1989 reading (1) 1990 (1)";
console.log(input.replace(/\(\d*\)(?:\s)/g,'$&\r\n'));
&#13;
答案 2 :(得分:0)
这是一个PHP解决方案:
$t = array();
$tags = "your tags here...";
$tags = explode(")",$tags);
foreach($tags as $tagJunk){
$tagJunk = explode("(", $tagJunk);
$t[] = array("tag"=>trim($tagJunk[0]), "count"=>trim($tagJunk[1]));
}
这是一个小提琴:https://3v4l.org/U2j0k