查找难以匹配文本文件中的网址

时间:2015-09-30 10:16:27

标签: python beautifulsoup urllib

我的文本文件包含:

http://www.makemytrip.com/
http://www.makemytrip.com/blog/dil-toh-roaming-hai?intid=Blog_HPHeader_Logo   //how do i remove /dil-toh-roaming-hai?intid=Blog_HPHeader_Logo 
http://www.makemytrip.com/rewards/?intid=New_ch_mtr_na
javascript:void(0)       //how do i remove this 
javascript:void(0)
javascript:void(0)
http://www.makemytrip.com/rewards/?intid=new_ch_mtr_dropdwn
https://support.makemytrip.com/MyAccount/MyTripReward/DashBoard
https://support.makemytrip.com/MyAccount/User/User
https://support.makemytrip.com/MyAccount/MyBookings/BookingSummary/
https://support.makemytrip.com/customersupports.aspx?actiontype=PRINTETICKET

我如何只检查网址并将其保存在另一个文件中,这样我就可以一次解析一个网址。我试过这个Python代码但它只匹配并打开第一个网址。

 import urllib

 with open("s.txt","r") as file:
 for line in file:
    url = urllib.urlopen(line)
    read = url.read()
    print read

0 个答案:

没有答案