我有一个有关5000个链接的列表。 前2 5000个链接:
https://racevietnam.com/runner/buiducninh/ecopark-marathon-2019
https://racevietnam.com/runner/drtungnguyen83/ecopark-marathon-2019
...
我想获取链接的一天中的时间列和完成行的值。
例如:
09:51:07 AM -https://racevietnam.com/runner/buiducninh/ecopark-marathon-2019
07:50:55 AM -https://racevietnam.com/runner/ngocsondknb/ecopark-marathon-2019
我得到一个网站的用户信息,该网站具有ID,类。但是https://racevietnam.com/runner/ngocsondknb/ecopark-marathon-2019中的表没有id,表中的类。所以我不能。
#!/usr/bin/python
from urllib.request import urlopen
from bs4 import BeautifulSoup
list_user = []
for userID in range(1, 100000):
link = "https://example.com/member.php?u=" + str(userID)
html = urlopen(link)
bsObj = BeautifulSoup(html, "lxml")
user_name = bsObj.find("div", {"id":"main_userinfo"}).h1.get_text()
list_user.append(user_name)
print("username", userID, "is: ", user_name)
with open("result.txt", "a") as myfile:
myfile.write(user_name)
请帮助我。
谢谢。
答案 0 :(得分:1)
使用bs4 4.7.1。
只有一个表,并且您想要最后一行的第二列(td
)。您可以使用last:child
选择最后一行;应当与tbody
类型选择器和>
子组合器结合使用,以免获得标题行。您可以使用nth-of-type
指定要返回的td
单元格。
现在您可能希望至少以两种方式进行开发:
name = getattr(soup.select_one('title'), 'text', 'N/A')
timing = getattr(soup.select_one('tbody > tr:last-child td:nth-of-type(2)'), 'text', 'N/A')
Python:
import requests
from bs4 import BeautifulSoup as bs
urls = ['https://racevietnam.com/runner/buiducninh/ecopark-marathon-2019', 'https://racevietnam.com/runner/drtungnguyen83/ecopark-marathon-2019']
with requests.Session() as s:
for url in urls:
r = s.get(url)
soup = bs(r.content, 'lxml')
name = soup.select_one('title').text
timing = soup.select_one('tbody > tr:last-child td:nth-of-type(2)').text
print(name, timing)
答案 1 :(得分:0)
这是我的代码。 工作正常。
import requests
from bs4 import BeautifulSoup
f = open("input.ecopark","r")
f_content = f.readlines()
f.close()
for url in f_content:
r = requests.get(url.rstrip())
soup = BeautifulSoup(r.text, 'html.parser')
result = soup.select("table tbody tr td")
x = ""
for i in result:
if not x:
if i.get_text() == "Finish":
x = 1
continue
if x:
print(url.rstrip()+ " "+i.get_text())
break