Question

我试图克隆私有github企业组织中的所有存储库。到目前为止，我已经通过使用Selenium进行webscraping设法获取Python列表中的每个repo链接，并且只需要在每个list元素上运行git clone。

我原以为可能有办法：

1）将列表转换为环境变量，并创建我在jupyter labs笔记本中运行的git clone bash循环，或者

2）使用python git库将列表中的每个repo克隆到指定目录

请告诉我如何实现这一目标。

这是我的代码：

from scrapy.selector import Selector
from selenium import webdriver

def get_repo_link():
    """
    Use scrapy and xpath to scrape repository urls
    """
    xpath_str = '//div[@class="d-inline-block mb-1"]/h3/a'
    git_link = Selector(text=html).xpath(xpath_str).extract()
    return git_link

# I am running this from a jupyter labs notebook, and need to login 
# to the organisation after the Selenium chrome window comes up
driver = webdriver.Chrome(executable_path='/anaconda/chromedriver')
driver.get(url='https://git.enterprise.name/org-name')

links = []
for num in range(1,7):
    if num == 1:
        html = driver.page_source
        for link in get_repo_link():
            links.append("https://git.enterprise.name" + link.split('"')[1])
    else:
        git_url = 'https://git.enterprise.name/org-name' + '?page=' + str(num)
        driver.get(url=git_url)
        html = driver.page_source
        for link in get_repo_link():
            links.append("https://git.enterprise.name" + link.split('"')[1])

# example of output of links:
['https://git.enterprise.name/org-name/some-repo-name',
 'https://git.enterprise.name/org-name/another-repo-name']

Answer 1

这是一个记录良好的库：https://gitpython.readthedocs.io/en/stable/reference.html#git.repo.base.Repo.clone

import git
git.Repo.clone_from(url)

从给定的网址
创建克隆
参数：
  url - 有效的git url，请参阅   http://www.kernel.org/pub/software/scm/git/docs/git-clone.html#URLS   to_path - 应将存储库克隆到的路径   进度 - 请参阅'git.remote.Remote.push'。   env - 包含所需环境变量的可选字典。   kwargs - 请参阅克隆方法   返回：
  Repo实例指向克隆目录

编辑：+1 @phd

python git克隆列表

1 个答案: