在给定的.html页面中,我有一个这样的脚本标签:
如何使用漂亮的汤来提取“function getData()”下的“retrun”信息?



<脚本>
函数getData()
 {
返回“zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank \ n10452,Bronx,NY,20606,2,147.7,74”;
}

 
 function getResultsCount()
 {
返回“1”;
}

< / script>

&#xA ; 答案 0 :(得分:1)
可以说最简单的一种方法是使用regular expression来定位元素并提取所需的字符串:
import re
from bs4 import BeautifulSoup
data = """
<script>
function getData()
{
return "zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank\n10452,Bronx,NY,20606,2,147.7,74";
}
function getResultsCount()
{
return "1";
}
</script>
"""
soup = BeautifulSoup(data, "html.parser")
pattern = re.compile(r'return "(.*?)";$', re.MULTILINE | re.DOTALL)
script = soup.find("script", text=pattern)
print(pattern.search(script.text).group(1))
打印:
zip,city,state,MedianIncome,MedianIncomeRank,CostOfLivingIndex,CostOfLivingRank
10452,Bronx,NY,20606,2,147.7,74