Question

我正在尝试从我一直在看的所有教程中下载原始代码，所以我做了这个：

import requests
from urllib import request
from bs4 import BeautifulSoup

page_url='https://github.com/buckyroberts/Source-Code-from-
Tutorials/tree/master/Python'

def page(main_url):
    code=requests.get(main_url)
    text=code.text
    soup=BeautifulSoup(text, "html.parser")
    for link in soup.findAll('a', {'class': 'js-navigation-open'}):
        code_url='https://github.com'+link.get('href')
        codelist(code_url)

def codelist(sec_url):
    code = requests.get(sec_url)
    text = code.text
    soup = BeautifulSoup(text, "html.parser")
    for link in soup.findAll('a', {'id': 'raw-url'}):
        raw_url='https://github.com'+link.get('href')
        rawcode(raw_url)

def rawcode(third_url):
    response = request.urlopen(third_url)
    txt = response.read()
    lines = txt.split("\\n")
    dest_url = r'go.py'
    fx = open(dest_url, "w")
    for line in lines:
        fx.write(line + "\n")
    fx.close()

page(page_url)

当我运行此代码时，我希望从这里创建包含40个不同代码的40个py文件 - https://github.com/buckyroberts/Source-Code-from-Tutorials/tree/master/Python 但它不起作用。两次，它随机选择下载40个文件中的一个。像这样 -

前两个函数可以很好地协同工作，直到调用第三个函数。但第三种方法很好。

我4天前就开始学习Python了，任何帮助都会非常感激。谢谢你们，伙计们！

Answer 1

[评论后]为了轻松更改文件名，您可以添加一个全局变量（此处为cp），如下所示：

def rawcode(third_url):
    global cp
    dest_url = r'go_%s.py' % cp
    cp += 1
    print(dest_url)

cp = 0
page(page_url)

文件名为“go_X.py"，X从0到文件数

修改使用您的代码：

def rawcode(third_url): response = request.urlopen(third_url) txt = response.read() lines = txt.split("\\n") global cp # We say that we will use cp the global variable and not local one dest_url = r'go_%s.py' % cp cp += 1 # We increment for further calls fx = open(dest_url, "w") # We can keep 'w' since we will generate new files at each call for line in lines: fx.write(line + "\n") fx.close() cp = 0 # Initialisation page(page_url)

使用循环写入文件

1 个答案: