使用select获取td文本

时间:2019-05-18 18:44:46

标签: python select html-table beautifulsoup

我试图获取链接的几率,但出现错误。你知道我在做什么错吗?

谢谢

String lastRouteKey = 'last_route';

void main() async {
  SharedPreferences preferences = await SharedPreferences.getInstance();
  String lastRoute = preferences.getString(lastRouteKey);
  runApp(MyApp(lastRoute));
}

class MyApp extends StatelessWidget {
  final String lastRoute;

  MyApp(this.lastRoute);

  @override
  Widget build(BuildContext context) {
    bool hasLastRoute = getWidgetByRouteName(lastRoute) != null;

    return MaterialApp(
      home: Foo(),
      initialRoute: hasLastRoute ? lastRoute : '/',
      onGenerateRoute: (RouteSettings route) {
        persistLastRoute(route.name);
        return MaterialPageRoute(
          builder: (context) => getWidgetByRouteName(route.name),
        );
      },
    );
  }

  Widget getWidgetByRouteName(String routeName) {
    switch (routeName) {
      case '/': return MainWidget();
      // Put all your routes here.
      default: return null;
    }
  }

  void persistLastRoute(String routeName) async {
    SharedPreferences preferences = await SharedPreferences.getInstance();
    preferences.setString(lastRouteKey, routeName);
  }
}

我希望获得10行3列,每个奇数一个。 但是,我遇到以下错误

import requests
from bs4 import BeautifulSoup as bs

url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = requests.get(url, headers = {'User-Agent' : 'Mozilla/5.0'})
soup = bs(r.content, 'lxml')

##print([a.text for a in soup.select('#tournamentTable tr[xeid] [href*=soccer]')])

print([b.text for b in soup.select('#tournamentTable td[xodd]')])

1 个答案:

答案 0 :(得分:0)

您似乎在#tournamentTabletd[xodd]之间使用了错误的字符。它可能看起来像空格,但是具有代码\x1b。您可以尝试删除此字符并再次放置空间。

我可以运行您的代码而不会出现此错误。但是此页面使用JavaScript来获取数据,BS无法运行JavaScript。您可能需要Selenium来控制可以运行JavaScript的Web浏览器,并且可以获取包含数据的HTML。

或者您可以在Chrome / Firefox中使用DevTool来检查JavaScript是否从某些url读取数据并从同一url读取数据。

我找到了网址

https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943

最后一部分是当前日期,作为时间戳* 1000

import datetime

print(datetime.datetime.fromtimestamp(1558215347943/1000)) 

# 2019-05-18 23:35:47.943000

dt = datetime.datetime.now()
print(int(dt.timestamp()*1000))

# 1558216525573

我可以使用requests.Session()和更好的headers来读取此url。它以JavaScript代码的形式提供数据,但是在切掉一部分后,我得到了JSON格式的数据,可以将其转换为Python字典

import requests
from bs4 import BeautifulSoup as bs
import json

s = requests.Session()

headers = {
    'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0'
}

url = 'https://www.oddsportal.com/soccer/spain/laliga'
r = s.get(url, headers=headers)

soup = bs(r.content, 'lxml')
print(r.text.find('xodd'))
print([b.text for b in soup.select('#tournamentTable td[xodd]')])

headers = {
    'User-Agent' : 'Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0',
    'Referer': 'https://www.oddsportal.com/soccer/spain/laliga/',
}

r = s.get('https://fb.oddsportal.com/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/?_=1558215347943', headers=headers)
text = r.text[len("globals.jsonpCallback('/ajax-sport-country-tournament/1/YLO7JZEA/X0/1/', "):-2]

data = json.loads(text)
for key, val in data['d']['oddsData'].items():
    print('xeid:', key)
    print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][0]['avg'])
    print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][1]['avg'])
    print('xoid:', val['odds'][0]['oid'], 'avg:', val['odds'][2]['avg'])
    print('---')

结果:

xeid: ltB92yKu
xoid: 35vjqxv464x0x7qrck avg: 2.16
xoid: 35vjqxv464x0x7qrck avg: 3.5
xoid: 35vjqxv464x0x7qrck avg: 3.44
---
xeid: SW9D1eZo
xoid: 35vjrxv464x0x7qrcm avg: 1.33
xoid: 35vjrxv464x0x7qrcm avg: 5.71
xoid: 35vjrxv464x0x7qrcm avg: 8.83
---
xeid: Mg9H0Flh
xoid: 35vjsxv464x0x7qrco avg: 1.99
xoid: 35vjsxv464x0x7qrco avg: 3.79
xoid: 35vjsxv464x0x7qrco avg: 3.68
---
xeid: zcDLaZ3b
xoid: 35vjtxv464x0x7qrcq avg: 1.57
xoid: 35vjtxv464x0x7qrcq avg: 4.38
xoid: 35vjtxv464x0x7qrcq avg: 5.95
---

编辑:使用硒

import selenium.webdriver

url = 'https://www.oddsportal.com/soccer/spain/laliga'

driver = selenium.webdriver.Firefox()
driver.get(url)

items = driver.find_elements_by_css_selector("#tournamentTable td[xodd]")

print([x.text for x in items])

结果:

['4.26', '4.07', '1.80', '1.99', '3.79', '3.68', '1.57', '4.38', '5.95', '2.13', '3.19', '3.94', '7.82', '5.00', '1.41', '2.16', '3.50', '3.44', '1.33', '5.71', '8.83', '2.58', '3.52', '2.73', '1.49', '5.31', '5.66', '4.03', '4.21', '1.82']