Question

也许你可以给我你的建议？

我有一个网页clarity-project.info/tenders/…，我需要提取data-id="<some number>"并将其写入新文件

这是我的代码：

from urllib.request import urlopen, Request
from bs4 import BeautifulSoup
import numpy as np
url = 'https://clarity-project.info/tenders/?entiy=38163425&offset=100'
agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3)AppleWebKit/537.36\
(KHTML, like Gecko) Chrome/35.0.1916.47 Safari/537.36'

request = Request(url, headers={'User-Agent': agent})

html = urlopen(request).read().decode()

soup = BeautifulSoup(html, 'html.parser')

tags = soup.findAll(lambda tag: tag.get('data-id', None) is not None)
with open('/Users/tinasosiak/Documents/number.txt', 'a') as f:
    for tag in tags:
       print(tag['data-id'])
       np.savetxt(f, 'data-id')

但是当我运行我的代码时，我收到了这个错误：

1f1d2745f1b641c6bd6831288b49d54e
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-5-556a89a7507f> in <module>()
     15     for tag in tags:
     16         print(tag['data-id'])
---> 17         np.savetxt(f, 'data-id')
     18 

/Users/tinasosiak/anaconda/lib/python3.6/site-packages/numpy/lib/npyio.py in savetxt(fname, X, fmt, delimiter, newline, header, footer, comments)
   1212                 ncol = len(X.dtype.descr)
   1213         else:
-> 1214             ncol = X.shape[1]
   1215 
   1216         iscomplex_X = np.iscomplexobj(X)

IndexError: tuple index out of range

Answer 1

这是你想要的吗？它将为您提供＆＃34; data-id＆＃34;的全部价值。使用包含这些数据的文本文件。

import requests
from bs4 import BeautifulSoup

file = open("testfile.txt","w")

res = requests.get('https://clarity-project.info/tenders/?entiy=38163425&offset=100').text
soup = BeautifulSoup(res,"lxml")
for item in soup.find_all(class_="table-row"):
    try:
        file.write(item.get('data-id')+'\n')
    except:
        continue
    print(item.get('data-id'))
file.close()

错误：元组索引超出范围python 3

1 个答案: