我有以下
输入:
abc_account2621_activity_20161116_20161117_030627_311999667.csv
xyx_account2622_click_2016111606_20161116_221031_311735299.csv
sed_account2623_impression_2016111605_20161116_221808_311685411.csv
abc_account2621_rich_media_2016111606_20161116_192542_311735300.csv
vbc_account2622_match_table_activity_cats_20161116_20161117_0311_31.csv.gz
sbc_account2622_match_table_activity_types_20161116_20161117_0342_31.csv.gz
预期输出
activity
click
impression
rich_media
match_table_activity_cats
match_table_activity_types
代码到现在为止:
我想访问[Number +( - )Underscore和End with( - )Underscore + Number]之间的单词
val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = "(_([A-Za-z]+_[0-9]))".r
var word=pattern3.findFirstIn(x).getOrElse("no match")
word: String = _types_2
答案 0 :(得分:2)
使用正则表达式查找非数字:
abc_account2621_([\D]+)_
对于xyx_abc _...使用:
([^_]+_[^_]+_)([\D]+)_
答案 1 :(得分:2)
val x = "abc_account2621_match_table_activity_types_20161116_20161117_0342_31.csv.gz"
val pattern3 = """_([a-zA-Z_]+)_\d+""".r
pattern3.findAllIn(x).matchData.map(_.group(1)).toList
_([a-zA-Z_]+)_\d+
matchData
来捕获group
可以捕获此内容。
请参阅regex101。
答案 2 :(得分:1)
我经常发现模式与正则表达式匹配很方便。
val pattern = """\d_(\D+)_\d""".r.unanchored
input.collect{case pattern(x) => x}
// res0: List(activity, click, impression, rich_media, match_table_activity_cats, match_table_activity_types)
答案 3 :(得分:0)
你可以尝试这个:
\d_([a-zA-Z_]+)_[0-9]+_
将标志设置为全局。
试试here