使用Regex从String中提取单词

时间:2017-07-26 05:21:11

标签: regex scala

我有以下

输入:

 abc_account2621_activity_20161116_20161117_030627_311999667.csv
 xyx_account2622_click_2016111606_20161116_221031_311735299.csv
 sed_account2623_impression_2016111605_20161116_221808_311685411.csv
 abc_account2621_rich_media_2016111606_20161116_192542_311735300.csv
 vbc_account2622_match_table_activity_cats_20161116_20161117_0311_31.csv.gz  
 sbc_account2622_match_table_activity_types_20161116_20161117_0342_31.csv.gz

预期输出

activity
click
impression
rich_media
match_table_activity_cats
match_table_activity_types

代码到现在为止:

我想访问[Number +( - )Underscore和End with( - )Underscore + Number]之间的单词

 val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"

 val pattern3 = "(_([A-Za-z]+_[0-9]))".r
 var word=pattern3.findFirstIn(x).getOrElse("no match")
 word: String = _types_2

4 个答案:

答案 0 :(得分:2)

使用正则表达式查找非数字:

abc_account2621_([\D]+)_

对于xyx_abc _...使用:

([^_]+_[^_]+_)([\D]+)_

答案 1 :(得分:2)

val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = """_([a-zA-Z_]+)_\d+""".r
pattern3.findAllIn(x).matchData.map(_.group(1)).toList

_([a-zA-Z_]+)_\d+ matchData来捕获group可以捕获此内容。

请参阅regex101

答案 2 :(得分:1)

我经常发现模式与正则表达式匹配很方便。

val pattern = """\d_(\D+)_\d""".r.unanchored
input.collect{case pattern(x) => x}
// res0: List(activity, click, impression, rich_media, match_table_activity_cats, match_table_activity_types)

答案 3 :(得分:0)

你可以尝试这个: \d_([a-zA-Z_]+)_[0-9]+_ 将标志设置为全局。 试试here