为什么我要发布到不同的URL,但为什么我得到相同的发布数据

时间:2020-06-27 13:54:28

标签: python-3.x web-scraping beautifulsoup python-requests

我正在尝试刮擦http://www.moneycontrol.com/stocks/histstock.php?sc_id=BPC&mycomp=BPCL 获取价格数据。 所以我遵循了以下

  1. 打开该链接并输入日期(每天)
  2. chrome-> inspect->网络-获取表单详细信息,并发现POST的URL
  3. 输入表单数据并点击POST。

我有多个行情自动收录器,我需要这些数据。

Eg:
    'AXISBANK': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK',
    'BAJAJ-AUTO': 'http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL',

但是当我运行POST时,即使我发布到的URL不同,我也会得到相同的输出。 我可能会缺少什么?

输出:

running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006
running for http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL
           Date   Open    High    Low   Close   Volume
244  05-01-2016  881.3  905.00  881.3  900.65  1372748
245  04-01-2016  876.2  892.45  871.7  880.80   709103
246  01-01-2016  882.0  885.60  876.9  878.75   294006

这是我编写的用于测试的代码。

url='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=AXISBANK'
url2='http://www.moneycontrol.com/stocks/hist_stock_result.php?ex=N&sc_id=API&mycomp=BPCL'
import requests
import pandas as pd
from bs4 import BeautifulSoup as bs
data = {
    'frm_dy':'01',
    'frm_mth':'01',
    'frm_yr':'2016',
    'to_dy':'31',
    'to_mth':'12',
    'to_yr':'2016',
    'hdn':'daily'
    # 'x':'15',
    # 'y':'14'
}
print('running for {}'.format(url))
test = requests.post(url,data=data) # Post the data
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

import time
time.sleep(20) # Hopefully sleep works?
url = url2 # test only 
print('running for {}'.format(url))
test = requests.post(url,data=data)
doc = bs(test.text,'html.parser')
tables = doc.find('table',{'class':'tblchart'})
tData = pd.read_html(str(tables),header=1) #You get a list

#Convert it to dataFrame
tData = tData[0].drop(columns=['(High-Low)','(Open-Close)'])
print(tData.tail(3))

与从“ Inspect”查看时相比,我直接从URL运行sc_id时发现它已更改。 我不知道什么是sc_id(sessions_ID?) 我对网页抓取完全陌生。所以我真的不知道陷阱,如果我碰到了。 我可能会缺少什么?

1 个答案:

答案 0 :(得分:1)

您必须在URL中正确设置参数sc_id=

对于AXIS银行,它是UTI10

对于Bajaj Auto,它是BA06

例如:

import re
import requests
import pandas as pd
from bs4 import BeautifulSoup


def get_sc_id(name, full_name):
    url = 'https://www.moneycontrol.com/stocks/autosuggest.php'
    params = {'str': name}
    return re.search(r'set_val\(\'{}\',\'(.*?)\'\)'.format(full_name), requests.get(url, params=params).text, flags=re.I)[1]

def get_table(sc_id, mycomp):
    url = 'https://www.moneycontrol.com/stocks/hist_stock_result.php'
    params = {
        'ex':'B',
        'sc_id': sc_id,
        'mycomp': mycomp
    }
    data = {
        'frm_dy':'01',
        'frm_mth':'01',
        'frm_yr':'2016',
        'to_dy':'31',
        'to_mth':'12',
        'to_yr':'2016',
        'hdn':'daily'
    }

    soup = BeautifulSoup(requests.post(url, data=data, params=params).content, 'html.parser')
    return pd.read_html( str(soup.select_one('.tblchart')) )[0].droplevel(0, axis=1)

code = get_sc_id('AXIS', 'Axis Bank')
print('Axis Bank code: ', code)
print(get_table(code, 'Axis Bank'))

code = get_sc_id('BAJAJ', 'Bajaj Auto')
print('Bajaj Auto code:', code )
print(get_table(code, 'Bajaj Auto'))

打印:

Axis Bank code:  UTI10

           Date    Open    High     Low   Close   Volume  (High-Low)  (Open-Close)
0    30-12-2016  446.00  451.80  443.45  450.00   234037        8.35         -4.00
1    29-12-2016  447.00  447.00  437.80  444.15   267677        9.20          2.85
2    28-12-2016  437.45  447.85  436.00  439.50   251149       11.85         -2.05
3    27-12-2016  430.00  438.55  430.00  437.45   210857        8.55         -7.45
4    26-12-2016  432.15  436.00  427.00  431.75   405044        9.00          0.40
..          ...     ...     ...     ...     ...      ...         ...           ...
242  07-01-2016  424.25  425.00  407.30  409.35  1441934       17.70         14.90
243  06-01-2016  439.70  439.70  429.80  430.80   730512        9.90          8.90
244  05-01-2016  439.00  440.00  433.65  436.35   726947        6.35          2.65
245  04-01-2016  448.85  448.85  437.40  439.25   743518       11.45          9.60
246  01-01-2016  450.00  452.70  445.80  449.80   433052        6.90          0.20

[247 rows x 8 columns]

Bajaj Auto code: BA06

           Date     Open     High      Low    Close  Volume  (High-Low)  (Open-Close)
0    30-12-2016  2655.55  2667.00  2627.25  2633.85   10377       39.75         21.70
1    29-12-2016  2621.00  2665.65  2611.50  2655.45    8704       54.15        -34.45
2    28-12-2016  2629.35  2653.00  2624.55  2631.60    6475       28.45         -2.25
3    27-12-2016  2563.00  2642.00  2563.00  2633.60   15491       79.00        -70.60
4    26-12-2016  2618.00  2618.35  2578.00  2596.70    7205       40.35         21.30
..          ...      ...      ...      ...      ...     ...         ...           ...
242  07-01-2016  2470.00  2481.80  2407.25  2419.25   15962       74.55         50.75
243  06-01-2016  2495.00  2513.70  2475.00  2485.50   11975       38.70          9.50
244  05-01-2016  2518.00  2520.00  2480.00  2497.05   11967       40.00         20.95
245  04-01-2016  2507.90  2545.85  2480.65  2488.15   23077       65.20         19.75
246  01-01-2016  2530.00  2530.00  2512.15  2520.05    9055       17.85          9.95

[247 rows x 8 columns]
相关问题