我正试图抓一个网页只是为了学习。在该网页中有多个" a"标签。考虑以下代码
<a href='\abc\def\jkl'> Something </a>
<a href ='http://www.google.com'> Something</a>
现在我想只读取那些有http的href属性。 我的当前代码是
for link in soup.find_all("a"):
print link.get("href")
我想将其更改为只读#34; http&#34;链接。
答案 0 :(得分:2)
可以使用这样的正则表达式:
import re
from bs4 import BeautifulSoup
res = """<a href="\abc\def\jkl">Something</a>
<a href="http://www.google.com">something</a>"""
soup = BeautifulSoup(res)
print soup.find_all('a', {'href' : re.compile('^http:.*')})
输出:
[<a href="http://www.google.com">something</a>]
答案 1 :(得分:2)
您也可以使用“以CSS selector开头:
print([a["href"] for a in soup.select('a[href^=http]')])
演示:
In [1]: from bs4 import BeautifulSoup
In [2]: res = """
...: <a href="\abc\def\jkl">Something</a>
...: <a href="http://www.google.com">something</a>
...: """
In [3]: soup = BeautifulSoup(res, "html.parser")
In [4]: print([a["href"] for a in soup.select('a[href^=http]')])
[u'http://www.google.com']
答案 2 :(得分:1)
只需运行此简单测试即可查看该链接是否包含字符串http
。您的代码中需要一行额外的行来执行此操作:
for link in soup.find_all('a'):
if 'http' in link.get('href'):
print(link.get('href'))
答案 3 :(得分:0)
另一种方法:
callback: function (result) {
if(result){
var forTempValue = "";
var forFieldValue = "";
var forMandatory = "";
var forDataImageNoApp = "";
var loopInt = 0;
$('input[type=textbox][name^=tmpField],select[name^=tmpField],input[type=text][name^=tmpField]').each(
function(index){
var input = $(this);
forTempValue+=("'" + input.prop('value').replace(/&/g,"") + "' Col" + loopInt + ",");
forFieldValue+=("'" + input.prop('value').replace(/&/g,"") + "'" + "±");
loopInt++;
}
);
$('input[type=hidden][name^=MandatoryName]').each(
function(index){
var input = $(this);
forMandatory+=(input.val() + ",");
}
);
$('input[type=hidden][name^=tempNoAppLoop]').each(
function(index){
var input = $(this);
forDataImageNoApp+=(input.val() + ",");
}
);
$.ajax({
type: "POST",
url: "../ESSCheckerMakerOperation",
data: "sTabID=<%=sTabID%>&sFieldNameApp=<%=sFieldNameApp%>",
success: function(msg){
<%
String getSeqNo = "SELECT MAX(fldSeqno) FROM "+MappingID+"";
String finalSeqno="";
ResultSet rs = aDbManager.retrieveRec(getSeqNo);
if(rs.next()){
finalSeqno=rs.getString(1);
}rs.close();
String cmd="EXEC sp_RelationshipValidation '"+MappingID+"'," + finalSeqno;
aDbManager.SQLTransaction(cmd);
String sqlgetSeq= "Select fldCommonErrorFlag " +
" From tblGeneralError" +
" Where fdSeqNo '" +finalSeqno + "' "+
" And fldTableName = '" +MappingID + "' ";
PrintDebug.println("Final SeqNo: "+finalSeqno);
%>
<%ResultSet rsVal = aDBManager.retrieveRec(sqlgetSeq);%>
<%if(rsVal.next()){ %>
document.getElementById('lblError').style.visibility = 'visible';
document.getElementById('lblError').innerHTML = <%=finalSeqno%>;
Ko
<%}else{%>
bootbox.alert({
title:"Record is successfully ",
message:"There is a problem with the validation for" + <%=finalSeqno%> ,
});
<%
}%>
//}
},
error: function(msg){
bootbox.alert({
title:"Error",
message: "Failed to save the record.",
});
}
});
};
}
});
此处链接['href']将获取href标记内的所有文本。