I'm using the regex.h library for my C program.
I need to download all files whose link is stored in tag in html data. So my first task is to extract its contents of "href" property.
I use this address to pactice http://students.iitk.ac.in/programmingclub/course/lectures/
在其html内容中,有许多标签,如
<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.pdf">
<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.ppt">
<a href="1.%20Introduction%20to%20C%20language%20and%20Linux.pptx">
...
我写了一个正则表达式字符串来提取“href”属性
中的内容char regex[] = "href=\"([a-zA-Z0-9%.,]*\\.[a-zA-Z0-9]*{1,4})\"";
我对正则表达式的期望(我可以自己处理完全匹配和组匹配)。
1.%20Introduction%20to%20C%20language%20and%20Linux.pdf
1.%20Introduction%20to%20C%20language%20and%20Linux.ppt
1.%20Introduction%20to%20C%20language%20and%20Linux.pptx
...
我收到的只是第一个链接(我只关心群组匹配)。
1.%20Introduction%20to%20C%20language%20and%20Linux.pdf
美好的一天,非常感谢你。
ps:我对regcomp()使用REG_EXTENDED