如何在erlang中过滤长二进制文件中的特殊字符?

时间:2015-10-14 17:03:06

标签: binary erlang

目的:删除所有" / r / n .... / r / n"来自长二进制文件。

我有一个非常长的二进制文件(大约10000个大小),如

  

<<" authtoken1,authtoken2 ... authtoken1000,\ r \ N1000 \ r \ N,authoken1001,authoken1002 ... authken2000,\ r \ n15df \ r \ nauthoken2001,authoken2002..authoken7600。 ... authoken10100">>

想要:

  

<<" authtoken1,authtoken2 ... authtoken1000,authoken1001,authoken1002 ... authken2000,authoken2001,authoken2002..authoken7600 .... authoken10100">>

我的临时解决方案是

13> Bin = <<"authtoken1,authtoken2,\r\n1\r\nauthoken3,\r\n2\r\nauthtoken4,authtoken5,\r\n3\r\nauthtoken6,authtoken7,authtoken8,\r\n2\r\nauthtoken9,authtoken10">>.
<<"authtoken1,authtoken2,\r\n1\r\nauthoken3,\r\n2\r\nauthtoken4,authtoken5,\r\n3\r\nauthtoken6,authtoken7,authtoken8,\r\n2\r\nauthtoken"...>>
14>  Bin2 = binary:split(Bin,[<<"\r\n">>],[global,trim]).
[<<"authtoken1,authtoken2,">>,<<"1">>,<<"authoken3,">>,
 <<"2">>,<<"authtoken4,authtoken5,">>,<<"3">>,
 <<"authtoken6,authtoken7,authtoken8,">>,<<"2">>,
 <<"authtoken9,authtoken10">>]
15> lists:foldl(fun(AuthToken,Acc) -> case erlang:size(AuthToken) >4 of true -> <<Acc/binary,AuthToken/binary>>; false -> Acc end end, <<>>, Bin2).
<<"authtoken1,authtoken2,authoken3,authtoken4,authtoken5,authtoken6,authtoken7,authtoken8,authtoken9,authtoken10">>

它的工作,但不是效率

2 个答案:

答案 0 :(得分:4)

我认为您要求的结果二进制文件只包含以逗号分隔的authtoken数据,并删除了所有其他数据?如果是这样,试试这个:

1> {ok,Pattern} = re:compile("authtoken\\d+").
{ok,{re_pattern,0,0,0,
                <<69,82,67,80,91,0,0,0,0,0,0,0,81,0,0,0,255,255,255,255,
                  255,255,...>>}}
2> {match,Found} = re:run(InputBinary,Pattern,[global,{capture,all,binary}]).
{match,[[<<"authtoken1">>],
        [<<"authtoken2">>],
        [<<"authtoken1000">>]]}
3> lists:foldl(fun([V],<<>>) -> <<V/binary>>;
                  ([V],Acc) -> <<Acc/binary,$,,V/binary>> end, <<>>, Found).
<<"authtoken1,authtoken2,authtoken1000">>

答案 1 :(得分:2)

仅供参考:

1> Bin = <<"authtoken1,authtoken2,\r\n1\r\nauthoken3,\r\n2\r\nauthtoken4,authtoken5,\r\n3\r\nauthtoken6,authtoken7,authtoken8,\r\n2\r\nauthtoken9,authtoken10">>.
<<"authtoken1,authtoken2,\r\n1\r\nauthoken3,\r\n2\r\nauthtoken4,authtoken5,\r\n3\r\nauthtoken6,authtoken7,authtoken8,\r\n2\r\nauthtoken"...>>
2> re:replace(Bin, <<"\r\n\d+\r\n">>, <<"">>, [global, {return, binary} ]).
<<"authtoken1,authtoken2,authoken3,authtoken4,authtoken5,authtoken6,authtoken7,authtoken8,authtoken9,authtoken10">>