我正在尝试从我一直在看的所有教程中下载原始代码,所以我做了这个:
import requests
from urllib import request
from bs4 import BeautifulSoup
page_url='https://github.com/buckyroberts/Source-Code-from-
Tutorials/tree/master/Python'
def page(main_url):
code=requests.get(main_url)
text=code.text
soup=BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'class': 'js-navigation-open'}):
code_url='https://github.com'+link.get('href')
codelist(code_url)
def codelist(sec_url):
code = requests.get(sec_url)
text = code.text
soup = BeautifulSoup(text, "html.parser")
for link in soup.findAll('a', {'id': 'raw-url'}):
raw_url='https://github.com'+link.get('href')
rawcode(raw_url)
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\\n")
dest_url = r'go.py'
fx = open(dest_url, "w")
for line in lines:
fx.write(line + "\n")
fx.close()
page(page_url)
当我运行此代码时,我希望从这里创建包含40个不同代码的40个py文件 - https://github.com/buckyroberts/Source-Code-from-Tutorials/tree/master/Python 但它不起作用。两次,它随机选择下载40个文件中的一个。像这样 -
前两个函数可以很好地协同工作,直到调用第三个函数。但第三种方法很好。
我4天前就开始学习Python了,任何帮助都会非常感激。谢谢你们,伙计们!
答案 0 :(得分:0)
[评论后]为了轻松更改文件名,您可以添加一个全局变量(此处为cp
),如下所示:
def rawcode(third_url):
global cp
dest_url = r'go_%s.py' % cp
cp += 1
print(dest_url)
cp = 0
page(page_url)
文件名为“go_X.py"
,X从0到文件数
修改强> 使用您的代码:
def rawcode(third_url):
response = request.urlopen(third_url)
txt = response.read()
lines = txt.split("\\n")
global cp # We say that we will use cp the global variable and not local one
dest_url = r'go_%s.py' % cp
cp += 1 # We increment for further calls
fx = open(dest_url, "w") # We can keep 'w' since we will generate new files at each call
for line in lines:
fx.write(line + "\n")
fx.close()
cp = 0 # Initialisation
page(page_url)