如何优化代码解析器?

时间:2016-09-07 08:41:08

标签: python python-3.x html-parsing bs4

我是Python的新手,刚刚开始学习面向对象编程的原理,这并不严格判断。 这段代码完全正常,但却混淆了一些元素:

1)我不介绍需要返回 init 的内容 2)我不喜欢self.

中的def get_parse():

最初,我这样做是为了能够轻松导致Parse.id或者您希望我归属

但是现在,我需要首先创建一个类的实例,然后在接受self和行数(get_parse)的同时调用函数k,只有这样,我才会适用于Parse.title

等项目

我在声明 init 时尝试做同样的事情,但没有任何结果,因为在这个阶段该功能可能无法使用(这尤其是语言,不是吗?在IPython中没有存在,还是我错了?)

from bs4 import BeautifulSoup
import urllib.request
import csv

class Parse:

    k = 1

    def __init__(self,k):
        pass


    def read_csv(k):
        with open('/home/narnikgamarnik/PycharmProjects/my_phyton3_projects/products_links2.csv') as f:
            r = csv.reader(f)
            cont = [row for row in r]
            d = (cont[k])[0]
            return d


    def get_url(d):
        try:
            url = urllib.request.urlopen(d)
        except urllib.error.HTTPError as err:
            if err.code == 404:
                return False
            else:
                raise
        return url


    def get_title(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            ol = soup.find('ol', 'breadcrumb')
            title = ol.find_all('li')[-1].string
        except AttributeError:
            return False
        return title

    def get_gender(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            ol = soup.find('ol', 'breadcrumb')
            gender = ol.find_all('a')[0].string
        except AttributeError:
            return False
        return gender

    def get_category(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            ol = soup.find('ol', 'breadcrumb')
            brand = ol.find_all('a').get_text[1].string
        except AttributeError:
            return False
        return brand


    def get_model(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            ol = soup.find('ol', 'breadcrumb')
            model = ol.find_all('a')[2].string
        except AttributeError:
            return False
        return model


    def get_article(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            product_code = soup.find('p', 'product__code')
            article = product_code.find_all('span')[0].string
        except AttributeError:
            return False
        return article


    def get_article_2(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            geth1 = soup.find('h1')
            article_2 = geth1.find_all('span')[0].string
        except AttributeError:
            return False
        return article_2


    def get_prices(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            product_price = soup.find_all('span', 'select_currency currency hide')
        except AttributeError:
            return False
        return product_price


    def get_img(url):
        try:
            soup = BeautifulSoup(url, 'html.parser')
            div = soup.find_all('div', 'fotorama fotorama-primary')
            for a in div:
                b = a.find_all('a')
                images = []
                for c in b:
                    d = c['data-full']
                    images.append(d)
        except AttributeError:
            return False
        return images


    def get_parse(self,k):
        self.d = self.read_csv(k)
        self.url = self.get_url(self.d)
        self.title = self.get_title(self.url)
        self.url = self.get_url(self.d)
        self.gender = self.get_gender(self.url)
        self.url = self.get_url(self.d)
        self.category = self.get_category(self.url)
        self.url = self.get_url(self.d)
        self.model = self.get_model(self.url)
        self.url = self.get_url(self.d)
        self.article = self.get_article(self.url)
        self.url = self.get_url(self.d)
        self.article_2 = self.get_article_2(self.url)
        self.url = self.get_url(self.d)
        self.prices = self.get_prices(self.url)
        self.price_pln = self.prices[0].string[3:6]
        self.price_usd = self.prices[1].string[3:6]
        self.price_eur = self.prices[2].string[3:6]
        self.price_gbp = self.prices[3].string[3:6]
        self.price_rub = self.prices[4].string[3:7]
        self.url = self.get_url(self.d)
        self.images = self.get_img(self.url)
        return self.d, self.title, self.gender, self.category, self.model, self.article, self.article_2, self.images, self.price_pln, self.price_usd, self.price_eur, self.price_gbp, self.price_rub, self.images

在任何情况下,这都不是选择我"的问题,而只是"咨询"有经验的程序员。 谢谢你!

0 个答案:

没有答案