当我使用re:run
时,我发现一件有趣的事情:当我使用dotall
选项时,效率非常低。
源代码:
main3() ->
Sdp =
"v=0\r\no=- 1001 11112 IN IP4 10.10.121.7\r\ns=-\r\nt=0 0\r\nm=audio 52363 RTP/AVPF 0 8\r\nc=IN IP4 10.10.121.7\r\na=rtcp:52369 IN IP4 138.85.151.208\r\na=candidate:1783138469 1 udp 2113937151 138.85.151.208 52363 typ host generation 0\r\na=candidate:4012290674 1 udp 2113937151 192.168.125.1 52364 typ host generation 0\r\na=candidate:1760259326 1 udp 2113937151 192.168.2.12 52367 typ host generation 0\r\na=candidate:2294684747 1 udp 2113937151 192.168.58.1 52368 typ host generation 0\r\na=candidate:1783138469 2 udp 2113937150 138.85.151.208 52369 typ host generation 0\r\na=candidate:4012290674 2 udp 2113937150 192.168.125.1 52370 typ host generation 0\r\na=candidate:1760259326 2 udp 2113937150 192.168.2.12 52371 typ host generation 0\r\na=candidate:2294684747 2 udp 2113937150 192.168.58.1 52372 typ host generation 0\r\na=candidate:617313365 1 tcp 1509957375 138.85.151.208 52530 typ host generation 0\r\na=candidate:2711965314 1 tcp 1509957375 192.168.125.1 52531 typ host generation 0\r\na=candidate:644386830 1 tcp 1509957375 192.168.2.12 52532 typ host generation 0\r\na=candidate:3326468283 1 tcp 1509957375 192.168.58.1 52533 typ host generation 0\r\na=candidate:617313365 2 tcp 1509957374 138.85.151.208 52534 typ host generation 0\r\na=candidate:2711965314 2 tcp 1509957374 192.168.125.1 52535 typ host generation 0\r\na=candidate:644386830 2 tcp 1509957374 192.168.2.12 52536 typ host generation 0\r\na=candidate:3326468283 2 tcp 1509957374 192.168.58.1 52537 typ host generation 0\r\na=ice-ufrag:root\r\na=ice-pwd:myreallysecretpassword\r\na=sendrecv\r\na=rtpmap:0 PCMU/8000\r\na=rtpmap:8 PCMA/8000\r\na=ssrc:1947760130 cname:OCGE4NpwFpLE/BFW\r\na=ssrc:1947760130 mslabel:oBAkRgSOpLdfl7u1JWdnMyUytcGGD4COvttP\r\na=ssrc:1947760130 label:oBAkRgSOpLdfl7u1JWdnMyUytcGGD4COvttP00\r\nm=video 52373 RTP/AVPF 126\r\nc=IN IP4 10.10.121.7\r\na=rtcp:52377 IN IP4 138.85.151.208\r\na=candidate:1783138469 1 udp 2113937151 138.85.151.208 52373 typ host generation 0\r\na=candidate:4012290674 1 udp 2113937151 192.168.125.1 52374 typ host generation 0\r\na=candidate:1760259326 1 udp 2113937151 192.168.2.12 52375 typ host generation 0\r\na=candidate:2294684747 1 udp 2113937151 192.168.58.1 52376 typ host generation 0\r\na=candidate:1783138469 2 udp 2113937150 138.85.151.208 52377 typ host generation 0\r\na=candidate:4012290674 2 udp 2113937150 192.168.125.1 52378 typ host generation 0\r\na=candidate:1760259326 2 udp 2113937150 192.168.2.12 52379 typ host generation 0\r\na=candidate:2294684747 2 udp 2113937150 192.168.58.1 52380 typ host generation 0\r\na=candidate:617313365 1 tcp 1509957375 138.85.151.208 52538 typ host generation 0\r\na=candidate:2711965314 1 tcp 1509957375 192.168.125.1 52539 typ host generation 0\r\na=candidate:644386830 1 tcp 1509957375 192.168.2.12 52540 typ host generation 0\r\na=candidate:3326468283 1 tcp 1509957375 192.168.58.1 52541 typ host generation 0\r\na=candidate:617313365 2 tcp 1509957374 138.85.151.208 52542 typ host generation 0\r\na=candidate:2711965314 2 tcp 1509957374 192.168.125.1 52543 typ host generation 0\r\na=candidate:644386830 2 tcp 1509957374 192.168.2.12 52544 typ host generation 0\r\na=candidate:3326468283 2 tcp 1509957374 192.168.58.1 52545 typ host generation 0\r\na=ice-ufrag:root\r\na=ice-pwd:myreallysecretpassword\r\na=sendrecv\r\na=rtpmap:126 H264/90000\r\n",
ReStr = "(.*)a=candidate.*host.*a=candidate.*host(.*)a=ice-ufrag.*a=setup:active(.*)a=mid:audio(.*)a=candidate.*host.*a=candidate.*host(.*)a=ice-ufrag.*a=setup:active(.*)a=mid:video(.*)",
{ok, Pattern1} = re:compile(ReStr, [{newline, crlf}]),
{Time1, _} = timer:tc(re, run, [ Sdp, Pattern1, [{capture,all_but_first,list}] ]),
io:format("not using dotall, time is ~p~n", [Time1]),
{ok, Pattern2} = re:compile(ReStr, [{newline, crlf}, dotall]),
{Time2, _} = timer:tc(re, run, [ Sdp, Pattern2, [{capture,all_but_first,list}] ]),
io:format("using dotall, time is ~p~n", [Time2]).
运行结果:
101> tt:main3().
not using dotall, time is 4499
using dotall, time is 2760364
ok
从结果中,我们可以发现差异是如此之大。
答案 0 :(得分:1)
通常默认情况下,如果未设置dotall
,则.
模式与\n
不匹配,因此搜索只会延伸到行尾。设置dotall
后,.
匹配所有字符,直到输入字符串结束。这会对您的情况产生影响,因为您的输入字符串包含许多行。
要记住的是re
基于PCRE,它遵循Perl正则表达式。这些功能的一个特点是它们是使用反向跟踪算法实现的,这意味着当您在模式中有替代方案时,例如.*
,它会导致大量搜索以找到匹配项。这是Perl正则表达式的属性,而不是由于执行不良。
有关实现正则表达式的各种方法的更长时间的讨论,请参阅维基百科Regular Expression(第三种算法),由Russ Cox撰写的Implementing Regular Expressions(第一篇论文),或者Friedl的{{{ 3}}(虽然他给3个算法错误的名字)。