AttributeError:'function'对象没有属性'urljoin'

时间:2017-12-26 15:30:08

标签: python python-2.7 beautifulsoup

我一直在使用pyhton2.7中的BeautifulSoup进行爬虫,我遇到了这个错误:

AttributeError:'function'对象没有属性'urljoin `

实际上是在行:

first_link = urlparse.urljoin('https://en.wikipedia.org/', article_link)

我使用urlparse

导入了urljoin

from urlparse import urljoin

2 个答案:

答案 0 :(得分:5)

您导入了两件事:

from urlparse import urlparse
from urlparse import urljoin

因此,名称urlparse绑定到一个函数,而不是模块。只需将urljoin用作全局,而不是属性:

first_link = urljoin('https://en.wikipedia.org/', article_link)

答案 1 :(得分:0)

我在 Python 2.7.18 上运行 urlparse 1.1.1 ,并且urljoin出现问题。据我了解,它不再受支持,但是我能够使用此方法正确提取每个URL。希望这可以帮助任何有类似问题的人

在解析提取的链接之前:

/intl/en/ads/
https://google.com/intl/en/ads/
/services/
https://google.com/services/
/intl/en/about.html

解析提取的链接后

https://google.com/intl/en/ads/
https://google.com/services/
https://google.com/intl/en/about.html
https://google.com/intl/en/policies/privacy/
https://google.com/intl/en/policies/terms/

代码以提取并加入链接(Linux上的终端脚本):

#!/usr/bin/env python
import requests
import re
import urlparse


target_url = raw_input("Enter the target url\n>")


class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'


def request(url):
    try:
        return requests.get("http://" + url)
        print(get_response)
    except requests.exceptions.ConnectionError:
        pass


def extract_links_from(url):
    response = request(str(target_url))
    return re.findall('(?:href=")(.*?)"',response.content)


def urljoin(href_links):
    for link in href_links:

        if "https://" + target_url not in link and "https://" in link:
            print(bcolors.WARNING + link + bcolors.ENDC)
        elif "https://www." + target_url not in link:
            print(bcolors.OKGREEN + "https://" + target_url + link + bcolors.ENDC)
        else:
            print(bcolors.OKBLUE + link + bcolors.ENDC)


href_links = extract_links_from(target_url)
urljoin(href_links)