我正在用python编写一个程序,该程序扫描我的朋友和我自己的GitHub页面,并显示所有上传文件的名称。我已经设法做到这一点。文件的所有名称都在标签下。问题是标签下还有其他随机文本,例如“通过上传添加文件”。我不希望这些出现。任何帮助,将不胜感激。亲切的问候。埃里克
我在打印最终结果时曾尝试过剥离字符串,但这仍然行不通。
这是我的代码:
import bs4
import requests
from bs4 import BeautifulSoup as soup
import lxml
import re
import time
import os
import webbrowser
import re
def webscrape():
res = requests.get('https://github.com/Dukesan7/jerichson')
type(res)
soup = bs4.BeautifulSoup(res.text, 'lxml')
type(soup)
file = soup.select('a')
file[1].getText()
time.sleep(1)
files = str(file)
clean = re.compile('<.*?>')
files = re.sub(clean, '', files)
print (files)
time.sleep(1)
print ("1. Main Menu: 1")
print ("2. exit?: 2")
op = input (":")
if op == "2":
exit()
else:
MainMenu()
答案 0 :(得分:0)
您的代码的简化版本:
from bs4 import BeautifulSoup as bs
import requests
res = requests.get('https://github.com/Dukesan7/jerichson')
soup = bs(res.text, 'lxml')
file = soup.find_all('a',class_="js-navigation-open")
for i in file:
if '.' in i.text:
print(i.text)
提供以下输出:
21s.py
BVVVVV.exe
Calling Casino.py
Game Download Link.txt
Homework.py
Password Username System.py
Puzzle.txt
StopWatch.py
Voting ligitimacy system.py
Vowl counter.py
agenotage.py
coin.py
dice.py
explorer reset.bat
name and age dukesan.py
notification.pyw
reminder.py
win 21 game.py
这是您要找的吗?
答案 1 :(得分:0)