如何刮取表格并找出特定列中最大数量的相应条目?

时间:2018-06-13 12:37:58

标签: python web-scraping

如何从“https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17

中删除表格

然后找出“PUTS”下的最大“OI”,并最终在该行中具有该特定最大OI的相应条目

到达打印行:

import urllib2
from urllib2 import urlopen
import bs4 as bs

url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'

html = urllib2.urlopen(url).read()
soup = bs.BeautifulSoup(html,'lxml')
table = soup.find('div',id='octable')
rows = table.find_all('tr')
for row in rows:
print row.text

1 个答案:

答案 0 :(得分:0)

您必须迭代<td>内的所有<tr>。你可以通过一堆for循环来做到这一点,但使用list comprehension更简单。仅使用此:

oi_column = [
    float(t[21].text.strip().replace('-','0').replace(',',''))
    for t in (t.find_all('td') for t in tables.find_all('tr'))
    if len(t) > 20
]

迭代表格的所有<td>中的所有<tr>,仅选择包含超过20个项目的行(排除最后一行)并执行文本替换或任何您想要符合要求的内容,这里将文本转换为float

整个代码将是:

from bs4 import BeautifulSoup
import requests

url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=-9999&symbol=BANKNIFTY&symbol=BANKNIFTY&instrument=OPTIDX&date=-&segmentLink=17&segmentLink=17'

response = requests.get(url)
soup = BeautifulSoup(response.content, "html.parser")

tables = soup.find("table", {"id":"octable"})

oi_column = [
    float(t[21].text.strip().replace('-','0').replace(',',''))
    for t in (t.find_all('td') for t in tables.find_all('tr'))
    if len(t) > 20
]
#column to check
print(oi_column)

print("max value : {}".format(max(oi_column)))
print("index of max value : {}".format(oi_column.index(max(oi_column)))) 

#the row at index
root = tables.find_all('tr')[2 + oi_column.index(max(oi_column))].find_all('td')
row_items = [
    (
        root[1].text.strip(),
        root[2].text.strip()
        #etc... select index you want to extract in the corresponding rows
    )
]
print(row_items)

您可以找到另一个例子来废弃像here

这样的表格