BeautifulSoup选择器无法匹配仲裁标签?

时间:2015-06-23 13:14:41

标签: python css beautifulsoup

我想获取带有data-a属性的标记。我认为正确的选择器返回一个空白列表。

如何使用CSS选择器成功选择带有data-a的标签?

In [53]: s = BeautifulSoup("<div data-a='12'></div>")

In [54]: s
Out[54]: <html><body><div data-a="12"></div></body></html>


In [55]: s.select('div')
Out[55]: [<div data-a="12"></div>]

In [56]: s.select('[data-a]')
Out[56]: []

1 个答案:

答案 0 :(得分:2)

这是BeautifulSoup CSS选择器实现中的已知限制;它只会将属性名称与字母,数字和下划线匹配,而不是短划线。请参阅issue #1304007

您仍然可以通过from django.contrib.auth import * def your_get_user(request): """ Returns the user model instance associated with the given request session. If no user is retrieved an instance of `AnonymousUser` is returned. """ from django.contrib.auth.models import User, AnonymousUser user = None try: user_id = _get_user_session_key(request) backend_path = request.session[BACKEND_SESSION_KEY] except KeyError: pass else: if backend_path in settings.AUTHENTICATION_BACKENDS: backend = load_backend(backend_path) user = backend.get_user(user_id) # Verify the session if ('django.contrib.auth.middleware.SessionAuthenticationMiddleware' in settings.MIDDLEWARE_CLASSES and hasattr(user, 'get_session_auth_hash')): session_hash = request.session.get(HASH_SESSION_KEY) session_hash_verified = session_hash and constant_time_compare( session_hash, user.get_session_auth_hash() ) if not session_hash_verified: log = logging.getLogger("YourLog") log.debug(session_hash) request.session.flush() user = None return user or AnonymousUser() 来选择这些元素:

find_all()

>>> s.find_all(**{'data-a': True}) [<div data-a="12"></div>] 应用任意关键字参数; **{..}不是有效的Python标识符,因此我们需要在那里使用解决方法。 data-a表示具有此属性的任何元素

您可以修补代码以接受属性名称中的破折号:

True

使用破折号更新的表达式匹配属性:

import re
from bs4 import PageElement

PageElement.attribselect_re = re.compile(
    r'^(?P<tag>\w+)?\[(?P<attribute>[\w-]+)(?P<operator>[=~\|\^\$\*]?)' +
    r'=?"?(?P<value>[^\]"]*)"?\]$'
    )