我不对一般编程不熟悉,但我不熟悉较大的程序,需要创建/导入我自己的模块。我之前用c语言完成过它,但它是几年前的...而且这是Python。
我正在寻找组织方面的指导。我终于找到了 HOW 将.PY文件导入到我的项目中(以及它应该是什么样子)以及添加到windows变量的路径,但现在我很好奇我是不是做事'纠正'或者什么是最佳做法。下面,我列出了一系列我已经阅读过但没有回答我的问题的链接,但我认为我会尝试将这个帖子作为一站式服务,因为我已经看到这是一个这些年来一直是个热门话题。
我尝试制作一个多功能的模块,这个模块充满了抓取功能,所以我可以像在测试文件中那样做,只需编写 ONE 行来做我需要的。即传入一个URL并返回页面中所有HTML标记及其频率的排序列表。 (这只是尝试学习组织和外部文件时的实验)这很痛苦,因为如果出现问题,我必须更改各种文件。
我收到的错误如下:" request = scraper_tools.get_request(url,data = None,headers = scraper.reg_header)NameError:name' scraper'未定义"
我做错了,还有更好的方法吗? (我假设有):)
我的代码是这样的:
scraper_tools.py
#!my_modules/python
# Filename: scraper_tools.py
import requests import bs4 as bs
phone_header = {'user-agent': 'Mozilla/5.0 (iPhone; CPU iPhone OS 9_2 like mac OS X)'} reg_header = {'user-agent': 'Mozilla/5.0 (Windows NT
6.1; rv:52.0) Gecko/20100101 Firefox/52.0'}
def make_soup(request, parser):
# Make soup
return bs.BeautifulSoup(request.text, parser)
def get_request (url, data, headers, **kwargs):
if kwargs and not headers:
try:
return requests.get(url, data=data)
except Exception as e:
print(e)
elif headers and kwargs:
return requests.get(url, headers=headers, data=data)
def get_all_items(soup, tag):
return soup.find_all(tag)
def open_file_write(path, filename):
save_path = path
return open(os.path.join(save_path, filename), 'w')
def get_all_links(self, soup):
href_tags = soup.find_all(href=True)
link_list = []
for tag in href_tags:
if 'http' in tag['href'][0:4]:
link_list.append(tag['href'])
return link_list
get_all_tags.py
from stevens_tools import scraper_tools
import operator
import requests
'''
Author: Steven Smith
Email: StevenSmithCIS@gmail.com
Date: 9/1/2017
Description: This file uses an online resource website to dynamically get all
common HTML tags in a list to be used to count list elements inside a specific
web page (and therefore know something about the quantity of each particular tag).
'''
html_tag_website_url = 'https://www.quackit.com/html/tags'
soup = None
tag_qnty_dict = None
request_object = None
def get_html_tags():
#Get all the HTML tags currently from the website
all_tags = []
ul_lists = soup.find_all('ul', {'class': 'col-3 taglist'})
for li in ul_lists:
for item in li.find_all('a'):
all_tags.append(item.text)
return all_tags
def get_all_tags_from(url):
#Returns a dictionary of all tags from HTML tag website in passsed in URL
#with tag and quanity listed
request = scraper_tools.get_request(url, data=None, headers=scraper_tools.reg_header)
soup = scraper_tools.make_soup(request, 'lxml')
tag_qnty_dict = {}
tags = get_html_tags_from_file()
if tags:
for tag in tags:
# If there is more than 0 items, add to list
item_qnty = len(scraper_tools.get_all_items(soup, tag))
if item_qnty > 0:
tag_qnty_dict.update({tag: item_qnty})
return tag_qnty_dict
def sort_items(reverse):
#Sorts items in tag dictionary by quantity. In reverse (largest first)
# if reverse is True
return sorted(tag_qnty_dict.items(), key=operator.itemgetter(1), reverse=reverse)
def print_all():
for item in sort_items(True):
print('Tag = ' + item[0] + " Quantity: = " + str(item[1]))
test_tag_counter.py
from stevens_tools import get_all_tags
get_all_tags.print_all(get_all_tags.get_all_tags_from('https://www.goodreads.com/list/tag/best'))
^^^^^^^^^^^^^^^^^^这些名字并不太疯狂,但是......它们是描述性的!洛尔
**我去过的其他主题
Python Packages and Modules (..在Python中导入模块/包) http://mikegrouchy.com/blog/2012/05/be-pythonic-init__py.html (使用 __init .py表示模块/包标识符)create Python package and import modules (导入每个文件vs一次) Why installing package and module not same in Python? (导入版本问题-Python 3.4 vs 2x) What's the difference between a Python module and a Python package? (< - 见名字lol) What's the difference between "package" and "module" (< - 见名) Remove package and module name from sphinx function (删除模块名称) importing package and modules from another directory in python(< - 使用sys) Best practices when importing in IPython http://docs.python-guide.org/en/latest/writing/structure/#modules