python传递包含引用

时间:2016-11-08 21:18:53

标签: python function arguments

我正在学习从网上抓取文字。我写了以下函数

from bs4 import BeautifulSoup
import requests

def get_url(source_url):
    r  = requests.get(source_url)
    data = r.text
    #extract HTML for parsing
    soup = BeautifulSoup(data, 'html.parser')
    #get H3 tags with class ...
    h3list = soup.findAll("h3", { "class" : "entry-title td-module-title" })
    #create data structure to store links in
    ulist = []
    #pull links from each article heading
    for href in h3list:
        ulist.append(href.a['href'])
    return ulist

我是从一个单独的文件中调用它...

from print1 import get_url 

ulist = get_url("http://www.startupsmart.com.au/")

print(ulist[3]) 

问题是我使用的css选择器对于我正在解析的站点是非常独特的。所以功能有点'脆弱'。我想将css选择器作为参数传递给函数

如果我在函数定义中添加一个参数

def get_url(source_url, css_tag):

并尝试传递"h3", { "class" : "entry-title td-module-title" }

它出现了

  

TypeError:get_url()只取1个参数(给定2个)

我试图转义所有引号,但它仍无效。

我真的很感激一些帮助。我无法找到这个答案。

1 个答案:

答案 0 :(得分:0)

这是一个有效的版本:

from bs4 import BeautifulSoup
import requests

def get_url(source_url, tag_name, attrs):
    r = requests.get(source_url)
    data = r.text
    # extract HTML for parsing
    soup = BeautifulSoup(data, 'html.parser')
    # get H3 tags with class ...
    h3list = soup.findAll(tag_name, attrs)
    # create data structure to store links in
    ulist = []
    # pull links from each article heading
    for href in h3list:
        ulist.append(href.a['href'])
    return ulist

ulist = get_url("http://www.startupsmart.com.au/", "h3", {"class": "entry-title td-module-title"})

print(ulist[3])