我有一个网络链接:
url = "https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=1270&symbol=RELCAPITAL&symbol=RELCAPITAL&instrument=-&date=-&segmentLink=17&symbolCount=2&segmentLink=17"
我需要将下表数据移动到pandas数据框。
答案 0 :(得分:4)
您可以使用(创建帮助器浏览器):
import urllib.request
user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'
url = "https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=1270&symbol=RELCAPITAL&symbol=RELCAPITAL&instrument=-&date=-&segmentLink=17&symbolCount=2&segmentLink=17"
headers={'User-Agent':user_agent,}
request=urllib.request.Request(url,None,headers)
response = urllib.request.urlopen(request)
data = response.read()
df=pd.read_html(data)[1]
print(df.head())
CALLS \
Chart OI Chng in OI Volume IV LTP Net Chng BidQty BidPrice
0 NaN - - - - - - 37500 32.45
1 NaN - - - - - - 37500 23.90
2 NaN - - - - - - 37500 15.35
3 NaN 15000 - - - 24.00 - 37500 6.65
4 NaN 46500 - 5 10.59 4.00 -8.00 1500 4.00
... PUTS \
AskPrice ... BidPrice AskPrice AskQty Net Chng LTP IV Volume
0 52.55 ... - 1.20 3000 - - - -
1 51.75 ... - 1.20 3000 - - - -
2 40.20 ... 1.00 1.10 1500 0.60 1.10 168.46 21
3 20.25 ... 2.50 2.55 1500 0.60 2.00 150.32 47
4 5.00 ... 5.35 6.00 9000 3.30 6.10 147.49 115
Chng in OI OI Chart
0 - - NaN
1 - - NaN
2 -10500 135000 NaN
3 -22500 192000 NaN
4 -34500 292500 NaN
[5 rows x 23 columns]
答案 1 :(得分:1)
使用请求,您可以这样做:
import pandas as pd
from requests import Session
s = Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
# Add headers
s.headers.update(headers)
URL = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp'
params = {'symbolCode':940,
'symbol':'DHFL',
'instrument': 'OPTSTK',
'date': '-',
'segmentLink': 17
}
res = s.get(URL, params=params)
df = pd.read_html(res.content)[1]
答案 2 :(得分:0)
import pandas as pd
from requests import Session
#############################################
pd.set_option('display.max_rows', 500000)
pd.set_option('display.max_columns', 100)
pd.set_option('display.width', 50000)
#############################################
s = Session()
headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) '\
'AppleWebKit/537.36 (KHTML, like Gecko) '\
'Chrome/75.0.3770.80 Safari/537.36'}
# Add headers
s.headers.update(headers)
URL = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp'
params = {'symbolCode':940,'symbol':'DHFL','instrument': 'OPTSTK','date': '-','segmentLink': 17}
res = s.get(URL, params=params)
df = pd.read_html(res.content)[1]
df.columns = df.columns.droplevel(-1)
df = df.iloc[2:len(df)-1].reset_index(drop=True)
df.columns = ['C_Chart','C_OI','C_Chng_in_OI','C_Volume','C_IV','C_LTP','C_Net_Chng','C_BidQty','C_BidPrice','C_AskPrice','C_AskQty','Strike_Price','P_BidQty','P_BidPrice','P_AskPrice','P_AskQty','P_Net_Chng','P_LTP','P_IV','P_Volume','P_Chng_in_OI','P_OI','P_Chart']
df = df[['C_LTP','C_BidQty','C_BidPrice','Strike_Price']]
print(df)
答案 3 :(得分:0)
这是提取表数据的一种方法。
import requests
import pandas as pd
url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=1270&symbol=RELCAPITAL&symbol=RELCAPITAL&instrument=-&date=-&segmentLink=17&symbolCount=2&segmentLink=17'
r = requests.get(url)
data = pd.read_html(r.content, header=0)
df = pd.DataFrame(data[1])
print(df)
另一种方法是:
import requests
import pandas as pd
from bs4 import BeautifulSoup
url = 'https://www.nseindia.com/live_market/dynaContent/live_watch/option_chain/optionKeys.jsp?symbolCode=1270&symbol=RELCAPITAL&symbol=RELCAPITAL&instrument=-&date=-&segmentLink=17&symbolCount=2&segmentLink=17'
r = requests.get(url)
data = r.content
soup = BeautifulSoup(r.content,'lxml')
data = []
table = soup.find('table', attrs=dict(id="octable"))
rows = table.find_all('tr')
for row in rows:
cols = row.find_all('td')
if bool(cols):
cols = [ele.text.strip() for ele in cols]
data.append([ele for ele in cols #if ele
]) # Get rid of empty values
else:
cols = row.find_all('th')
cols_2 = []
for ele in cols:
e = ele.text.strip()
cols_2.append(e)
colspan = int(ele.attrs.get('colspan', 0))
if bool(colspan):
for i in range(1, colspan):
cols_2.append('')
data.append(cols_2)
print(data)
df=pd.DataFrame(data)
print(df)