只获取带有xpath的text()的一部分

时间:2016-02-03 23:53:11

标签: xpath scrapy substring response

我试图获得本网站的作者数组:

jwt and flask-login

使用此xpath:

@login_manager.user_loader
def load_user(token):
    try:
        id = jwt.decode(token, current_app.config.get(
            'KEY', None))['_id']
        return user_functions.get_user_by_id(id, token)
    except (ServiceUnavailable, ServiceResponseError) as e:
        current_app.logger.exception(e.message)
        return None



class Login(MethodView):

    methods = ['GET', 'POST']

    def post(self):
        form = LoginForm(request.form)
        registered_user = form.validate()
        if not registered_user:
            return render_template('pages/login.html', login_form=form), 403
        login_user(registered_user, remember=False)
        return redirect(request.args.get('next') or url_for('public.index'))

但我只想要名字,没有"编辑",我该怎么办?

1 个答案:

答案 0 :(得分:0)

选择文本后,将正则表达式函数re()与捕获组一起使用,以排除您不需要的文本:

response.xpath("//div[@id='sizer']/div[@id='content']/div[@class='grid']/div[@class='main-content']/div[@id='tc']/div/ul[@class='book-listing entity-listing']/li/dl/dd[@class='meta']/text()[count(preceding-sibling::br) = 0]")
  .re(r'Editor\s*(.*)')