Question

我想在日历中获取数据：

http://www.purebhakti.com/component/panjika

我考虑过使用Python和beautifulsoap，但我接受了建议。

我想参加当天的活动：

2017年4月22日：Ekādaśī，K，06：09，Śatabhiṣā

+ŚUDDHAEKĀDAŚĪVRATA：为VarūthinīEKADASI的快餐

如何让程序到达日历（自动进行时区和城市选择后）？例如：时区= -3：00布宜诺斯艾利斯 city =里约热内卢

from bs4 import BeautifulSoup
import requests

url = 'http://www.purebhakti.com/component/panjika'
header = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) '
                        'AppleWebKit/537.36 (KHTML, like Gecko) '
                        'Chrome/51.0.2704.103 Safari/537.36'}



req = requests.get(url,headers= header)

html = req.text

soup = BeautifulSoup(html,'html.parser')

Answer 1

有很多方法可以解决这个问题：

- 您可以使用Selenium WebDriver点击按钮并选择时区和城市。

- Selenium的另一个选择是使用pyautogui（“pyautogui.locateOnScreen”函数）。

- 下载日历（网页来源）使用urllib2

- 要从日历中获取必要的数据，请使用Beautiful Soap

Answer 2

import requests, bs4
from urllib.parse import parse_qsl

qs = 'action=2&timezone=23&location=Rio+de+Janeiro%2C+Brazil++++++++043W15+22S54+++++-3.00&button=Get+Calendar'
payload = dict(parse_qsl(qs))
r = requests.post('http://www.purebhakti.com/component/panjika', data=payload)

当您单击按钮时，您将数据发布到服务器，您可以在chrome开发工具中找到数据。

我们可以通过requests.post()

模仿此行为

我还使用parse_qsl将编码的url转换为python dict：

{'action': '2',
'button': 'Get Calendar',
 'location': 'Rio de Janeiro, Brazil        043W15 22S54     -3.00',
 'timezone': '23'}

使用Python和Beautifulsoup从日历中提取数据

2 个答案: