我正在尝试对字符串进行标记。只要没有引用字符一切都很好:
string:tokens ("abc def ghi", " ").
["abc","def","ghi"]
但字符串:tokens / 2确实对引用字符串有很大帮助。它表现得像预期的那样:
string:tokens ("abc \"def xyz\" ghi", " ").
["abc","\"def","xyz\"","ghi"]
我需要的是一个函数,它将字符串标记化,分隔符和引号字符。类似的东西:
tokens ("abc \"def xyz\" ghi", " ", "\"").
["abc","def xyz","ghi"]
在我开始重新发明轮子之前,我的问题是:
标准库中是否有这样的功能或类似功能?
修改
好的,我编写了自己的实现,但我对原始问题的答案仍然非常感兴趣。到目前为止,这里是我的代码:
tokens (String) -> tokens (String, [], [] ).
tokens ( [], Tokens, Buffer) ->
lists:map (fun (Token) -> string:strip (Token, both, $") end, Tokens ++ [Buffer] );
tokens ( [Character | String], Tokens, Buffer) ->
case {Character, Buffer} of
{$ , [] } -> tokens (String, Tokens, Buffer);
{$ , [$" | _] } -> tokens (String, Tokens, Buffer ++ [Character] );
{$ , _} -> tokens (String, Tokens ++ [Buffer], [] );
{$", [] } -> tokens (String, Tokens, "\"" );
{$", [$" | _] } -> tokens (String, Tokens ++ [Buffer ++ "\""], [] );
{$", _} -> tokens (String, Tokens ++ [Buffer], "\"");
_ -> tokens (String, Tokens, Buffer ++ [Character] )
end.
答案 0 :(得分:5)
如果在一般情况下可接受正则表达式,则可以使用:
> re:split("abc \"def xyz\" ghi", " \"|\" ", [{return, list}]).
["abc","def xyz","ghi"]
如果您想根据任何空格而不是空格进行拆分,也可以使用"\s\"|\"\s"
。
如果您正在从输入文件中解析此问题,则可能需要使用estring中的strip_split/2
。
答案 1 :(得分:2)
string:tokens ("abc \"def ghi\" foo.bla", " .\"").
将对空格,点和双引号上的字符串进行标记。结果:["abc", "def", "ghi", "foo", "bla"]
。如果你想保留引用的部分,你可能要考虑创建一个令牌/词括号,因为正则表达式不是很擅长这项工作。
答案 2 :(得分:1)
您可以使用re模块。它带有split/3
功能。例如:
re:split("abc \"def xyz \"ghi", "[(\s\")\s\"]", [{return, list}]). ["abc",[],"def","xyz",[],"ghi"]
第二个参数是正则表达式(您可能需要调整我的示例以删除空列表...)
答案 3 :(得分:1)
这大约是我写的方式(未经测试!):
tokens(String) -> lists:reverse(tokens(String, outside_quotes, [])).
tokens([], outside_quotes, Tokens) ->
Tokens;
tokens(String, outside_quotes, Tokens) ->
{Token, Rest0} = lists:splitwith(fun(C) -> (C != $ ) and (С != $"), String),
case Rest0 of
[] -> [Token | Tokens];
[$ | Rest] -> tokens(Rest, outside_quotes, [Token | Tokens]);
[$" | Rest] -> tokens(Rest, inside_quotes, [Token | Tokens])
end;
tokens(String, inside_quotes, Tokens) ->
%% exception on an unclosed quote
{Token, [$" | Rest]} = lists:splitwith(fun(C) -> С != $", String),
tokens(Rest, outside_quotes, [Token | Tokens]).