我试图使用以下grep命令:
grep '(.*)(?=(png|html|jpg|js|css)(?:\s*))(png|html|jpg|js|css.*\s)' file
文件包含以下内容:
http://manage.bostonglobe.com/GiftTheGlobe/LandingPage.html
https://manage.bostonglobe.com/cs/mc/login.aspx?p1=BGFooter
https://www.bostonglobe.com/bgcs
/newsletters?p1=BGFooter_Newsletters
https://bostonglobe.custhelp.com/app/home?p1=BGFooter
https://bostonglobe.custhelp.com/app/answers/list?p1=BGFooter
/tools/help/stafflist?p1=BGFooter
https://www.bostonglobemedia.com/
https://manage.bostonglobe.com/Order/newspaper/Newspaper.aspx
https://www.facebook.com/globe
https://twitter.com/#!/BostonGlobe
https://plus.google.com/108227564341535363126/about
https://epaper.bostonglobe.com/launch.aspx?pbid=2c60291d-c20c-4780-9829- b3d9a12687cf
http://nieonline.com/bostonglobe/
https://secure.pqarchiver.com/boston-sub/no_default.html?ss=1&url=%2Fboston-sub%2Fadvancedsearch.html
/tools/help/privacy?p1=BGFooter
/tools/help/terms-service?p1=BGFooter
/termsofpurchase?p1=BGFooter
https://www.bostonglobemedia.com/careers
/css/globe-print.css?v=19256I1935
//meter.bostonglobe.com/css/style.css
/css/globe-print.css?v=19256I1935
//cdn.blueconic.net/bostonglobemedia.js
/js/lib/rwd-images.js,lib/respond.min.js,lib/modernizr.custom.min.js,globe- define.js,globe-controller.js?v=19256I1935
data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///ywAAAAAAQABAAACAUwAOw==
/js/lib/jquery.js,lib/lo-dash-custom-2.4.1.js,lib/a9.js,lib/pb.js,dist/ad- init.js,globe-newsletter.js,globe-profile-page.js,dist/globe-topic-nav.js,dist/rakuten.js?v=19256I1935
//dc8xl0ndzn2cb.cloudfront.net/js/bostonglobe/v0/keywee.min.js
由于某种原因,它没有从该文件中选择任何内容,我尝试了不同的标志,但似乎无法找出问题所在
答案 0 :(得分:0)
您正在将PCRE正则表达式与POSIX BRE引擎(默认grep
引擎)配合使用。
要使这些模式有效,应使用-P
选项(在GNU grep
中可用):
grep -P 'YOUR_PCRE_PATTERN'
^^
要开发和测试PCRE模式,通常建议使用知名的regex101.com。
请注意,在Mac OS上,您可以install GNU grep
via brew
。