也许你可以给我你的建议?
我有一个网页clarity-project.info/tenders/…
,我需要提取data-id="<some number>"
并将其写入新文件
这是我的代码:
from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import numpy as np
url = 'https://clarity-project.info/tenders/?entiy=38163425&offset=100'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)AppleWebKit/537.36\
(KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'
request = Request(url, headers={'User-Agent': agent})
html = urlopen(request).read().decode()
soup = BeautifulSoup(html, 'html.parser')
tags = soup.findAll(lambda tag: tag.get('data-id', None) is not None)
with open('/Users/tinasosiak/Documents/number.txt', 'a') as f:
for tag in tags:
print(tag['data-id'])
np.savetxt(f, 'data-id')
但是当我运行我的代码时,我收到了这个错误:
1f1d2745f1b641c6bd6831288b49d54e
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-5-556a89a7507f> in <module>()
15 for tag in tags:
16 print(tag['data-id'])
---> 17 np.savetxt(f, 'data-id')
18
/Users/tinasosiak/anaconda/lib/python3.6/site-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments)
1212 ncol = len(X.dtype.descr)
1213 else:
-> 1214 ncol = X.shape[1]
1215
1216 iscomplex_X = np.iscomplexobj(X)
IndexError: tuple index out of range
答案 0 :(得分:0)
这是你想要的吗?它将为您提供&#34; data-id&#34;的全部价值。使用包含这些数据的文本文件。
import requests
from bs4 import BeautifulSoup
file = open("testfile.txt","w")
res = requests.get('https://clarity-project.info/tenders/?entiy=38163425&offset=100').text
soup = BeautifulSoup(res,"lxml")
for item in soup.find_all(class_="table-row"):
try:
file.write(item.get('data-id')+'\n')
except:
continue
print(item.get('data-id'))
file.close()