grep -i -o '<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"' | sed -e 's/^.*"\([^"]\+\)".*$/\1/g'
在拖网上找到我的作业问题的答案后,我终于得到了上述内容。但我并不完全理解sed和grep使用的两个正则表达式的含义。有人可以对我说清楚吗?提前谢谢。
答案 0 :(得分:2)
grep
命令查找包含与
'<a[^>]\+href[ ]*=[ \t]*"\(ht\|f\)tps\?:[^"]\+"'
是
<a the characters <a
[^>] not followed by a close '>'
\+ the last thing one or more times (this is really not necessary I think.
with this, it would be "not followed by exactly one '>' which would be fine
href followed by the string 'href'
[ ]* followed by zero or more spaces (you don't really need the [], just ' *' would be enough)
= followed by the equals sign
[ \t]* followed by zero or more space or tab ("white space")
" followed by open quote (but only a double quote...)
\( open bracket (grouping)
ht characters 'ht'
\| or
f character f
\) close group (of the either-or)
tp characters 'tp'
s\? optionally followed by s
Note - the last few lines combined means 'http or https or ftp or ftps'
: character :
[^"]\+ one or more characters that are not a double quote
this is "everything until the next quote"
这会让你入手吗?你可以为下一位做同样的事情......
注意让你感到困惑 - 反斜杠用于改变某些特殊字符的含义,如()+
;只是为了让每个人都保持警惕,无论这些是否具有特殊含义,无论是否使用反斜杠都不是由正则表达式语法定义的东西,而是由您使用它的命令(及其选项)定义的。例如,sed
会根据您是否使用-E
标记来更改事物的含义。