测试字符串:
str = "#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb
#okie_dokkie #fr!ends #!alPacino #wonderfulRide
#good#club #rhônealpes #trèsbon #øypålandet http://example.com/#comment
#moreTags #www nobody #h3y!boy #EMAIL"
这就是我的尝试:
String.split(str, ~r/\B(#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+)/, trim: true,
include_captures: true)
但它并没有排除网址中的主题标签以及我收到的内容:
["#www", " ", "#SoulMusic", " ", "#50_shades_of_Blue", " # #", "#WorldWideWeb", " ", "#okie_dokkie", " ", "#fr", "!ends #!alPacino ", "#wonderfulRide", " ", "#good", "#club ", "#rhônealpes", " ", "#trèsbon", " ", "#øypålandet", " http://example.com/", "#comment", " ", "#moreTags", " ", "#www", " nobody ", "#h3y", "!boy ", "#EMAIL"]
我的目标是:
["#www", "#SoulMusic", "#50_shades_of_Blue", "#WorldWide",
"#okie_dokkie", "#fr", "wonderfulRide", "#good",
"#rhônealpes", "#trèsbon", "#øypålandet", "#moreTags", "#www",
"#h3y", "#EMAIL"]
对此有任何帮助将不胜感激。
答案 0 :(得分:2)
如果您只需要匹配,则需要查找Regex.scan/2
:
iex(1)> str = "#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb
...(1)> #okie_dokkie #fr!ends #!alPacino #wonderfulRide
...(1)> #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment
...(1)> #moreTags #www nobody #EMAIL"
"#www #SoulMusic #50_shades_of_Blue # ##WorldWideWeb \n #okie_dokkie #fr!ends #!alPacino #wonderfulRide \n #good#club #rhônealpes #trèsbon #gøypålandet http://example.com/#comment \n #moreTags #www nobody #EMAIL"
iex(2)> Regex.scan(~r/\B#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+/, str)
[["#www"], ["#SoulMusic"], ["#50_shades_of_Blue"], ["#WorldWideWeb"],
["#okie_dokkie"], ["#fr"], ["#wonderfulRide"], ["#good"], ["#rhônealpes"],
["#trèsbon"], ["#gøypålandet"], ["#comment"], ["#moreTags"], ["#www"],
["#EMAIL"]]
这将返回列表列表。您可以使用Enum.concat/1
来展平它以获取字符串列表:
iex(3)> Regex.scan(~r/\B#[á-úÁ-Úä-üÄ-Üa-zA-Z0-9_]+/, str) |> Enum.concat
["#www", "#SoulMusic", "#50_shades_of_Blue", "#WorldWideWeb", "#okie_dokkie",
"#fr", "#wonderfulRide", "#good", "#rhônealpes", "#trèsbon",
"#gøypålandet", "#comment", "#moreTags", "#www", "#EMAIL"]