使用Python导入包和模块使用/导入 - 层次结构和最佳实践。 (外部文件新手)

时间:2017-09-01 16:32:47

标签: python import module theory python-3.6

对一般编程不熟悉,但我不熟悉较大的程序,需要创建/导入我自己的模块。我之前用c语言完成过它,但它是几年前的...而且这是Python。

我正在寻找组织方面的指导。我终于找到了 HOW 将.PY文件导入到我的项目中(以及它应该是什么样子)以及添加到windows变量的路径,但现在我很好奇我是不是做事'纠正'或者什么是最佳做法。下面,我列出了一系列我已经阅读过但没有回答我的问题的链接,但我认为我会尝试将这个帖子作为一站式服务,因为我已经看到这是一个这些年来一直是个热门话题。

尝试制作一个多功能的模块,这个模块充满了抓取功能,所以我可以像在测试文件中那样做,只需编写 ONE 行来做我需要的。即传入一个URL并返回页面中所有HTML标记及其频率的排序列表。 (这只是尝试学习组织和外部文件时的实验)这很痛苦,因为如果出现问题,我必须更改各种文件。

我收到的错误如下:" request = scraper_tools.get_request(url,data = None,headers = scraper.reg_header)NameError:name' scraper'未定义"

我做错了,还有更好的方法吗? (我假设有):)

我的代码是这样的:

scraper_tools.py

#!my_modules/python
# Filename: scraper_tools.py

import requests import bs4 as bs

phone_header = {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like mac OS X)'} reg_header = {'user-agent': 'Mozilla/5.0 (Windows NT
6.1; rv:52.0) Gecko/20100101 Firefox/52.0'}

def make_soup(request, parser):
    # Make soup
    return bs.BeautifulSoup(request.text, parser)

def get_request (url, data, headers, **kwargs):
    if kwargs and not headers:
        try:
            return requests.get(url, data=data)
        except Exception as e:
            print(e)
    elif headers and kwargs:
        return requests.get(url, headers=headers, data=data)


def get_all_items(soup, tag):
    return soup.find_all(tag)

def open_file_write(path, filename):
    save_path = path
    return open(os.path.join(save_path, filename), 'w')

def get_all_links(self, soup):
    href_tags = soup.find_all(href=True)
    link_list = []

    for tag in href_tags:
        if 'http' in tag['href'][0:4]:
            link_list.append(tag['href'])

    return link_list

get_all_tags.py

from stevens_tools import scraper_tools
import operator
import requests

'''
Author: Steven Smith
Email: StevenSmithCIS@gmail.com
Date: 9/1/2017
Description: This file uses an online resource website to dynamically get all 
common HTML tags in a list to be used to count list elements inside a specific
web page (and therefore know something about the quantity of each particular tag).
'''

html_tag_website_url = 'https://www.quackit.com/html/tags'
soup = None
tag_qnty_dict = None
request_object = None

def get_html_tags():
    #Get all the HTML tags currently from the website
    all_tags = []
    ul_lists = soup.find_all('ul', {'class': 'col-3 taglist'})
    for li in ul_lists:
        for item in li.find_all('a'):
            all_tags.append(item.text)
    return all_tags

def get_all_tags_from(url):
    #Returns a dictionary of all tags from HTML tag website in passsed in URL
    #with tag and quanity listed
    request = scraper_tools.get_request(url, data=None, headers=scraper_tools.reg_header)
    soup = scraper_tools.make_soup(request, 'lxml')
    tag_qnty_dict = {}
    tags = get_html_tags_from_file()
    if tags:
        for tag in tags:
            # If there is more than 0 items, add to list
            item_qnty = len(scraper_tools.get_all_items(soup, tag))
            if item_qnty > 0:
                tag_qnty_dict.update({tag: item_qnty})
    return tag_qnty_dict

def sort_items(reverse):
    #Sorts items in tag dictionary by quantity. In reverse (largest first)
    # if reverse is True
    return sorted(tag_qnty_dict.items(), key=operator.itemgetter(1), reverse=reverse)

def print_all():
    for item in sort_items(True):
        print('Tag = ' + item[0] + " Quantity: = " + str(item[1]))

test_tag_counter.py

from stevens_tools import get_all_tags

get_all_tags.print_all(get_all_tags.get_all_tags_from('https://www.goodreads.com/list/tag/best'))

^^^^^^^^^^^^^^^^^^这些名字并不太疯狂,但是......它们是描述性的!洛尔

**我去过的其他主题

  

Python Packages and Modules   (..在Python中导入模块/包)   http://mikegrouchy.com/blog/2012/05/be-pythonic-init__py.html (使用   __init .py表示模块/包标识符)create Python package and import modules   (导入每个文件vs一次)   Why installing package and module not same in Python?   (导入版本问题-Python 3.4 vs 2x)   What's the difference between a Python module and a Python package?   (< - 见名字lol)   What's the difference between "package" and "module"   (< - 见名)   Remove package and module name from sphinx function   (删除模块名称)   importing package and modules from another directory in python(< - 使用sys)   Best practices when importing in IPython   http://docs.python-guide.org/en/latest/writing/structure/#modules

0 个答案:

没有答案