用户在Django表单中输入的无效RegEx导致500服务器错误

时间:2014-05-29 18:11:15

标签: python regex django forms

我在我的网络应用程序中设置了一个简单的Django搜索表单,用户可以在其中搜索我的阿拉伯语语料库中的特定单词。用户可以搜索以下三种方式之一:'确切' (正如它的类型一样),' Stem' (它带来了所有变形形式的引理)和“RegEx”#39; (他们可以通过正则表达式进行更复杂的搜索)。

我遇到的问题是,如果用户提交了无效的正则表达式,而不是给出验证错误或空结果,则会触发500服务器错误。我想象的是混乱。下面是搜索具有不平衡括号的正则表达式引起的此类和错误的追溯:ha((。*(?!al))

无论如何都要抓住这种错误,或者让它更加用户友好? (我还在下面列出了我的表格的代码。)

谢谢。

class ConcordanceForm(forms.Form):
    searchterm = forms.CharField(max_length=100, required=True)
    search_type = forms.ChoiceField(widget=RadioSelect(), 
             choices= ([('string', 'Exact'), ('lemma', 'Stem'), ('regex', 'Regex') ]),
             required=True )


def concord_test(request):  
    if request.method == 'POST': 
        form = ConcordanceForm(request.POST)
        if form.is_valid(): 
            searchterm = form.cleaned_data['searchterm'].encode('utf-8')
            search_type = form.cleaned_data['search_type']
            context, texts_len, results_len = make_concordance(searchterm, search_type)
            return render_to_response('corpus/concord.html', locals())
    else:
        form = ConcordanceForm()
    return render_to_response('corpus/search_test.html', 
                              {'form': form}, context_instance=RequestContext(request))



<p style=" font-weight:bold;">Search for any word in the corpus:</p>
<form action="/search_test/" method="post">{% csrf_token %}
{{ form.as_p }}
<input type="submit" value="Submit" />
</form>

追踪(最近一次呼叫最后一次):

  File "/home/larapsodia/webapps/django/lib/python2.6/django/core/handlers/base.py", line 100, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "/home/larapsodia/webapps/django/tunisiya2/corpus/views.py", line 154, in concord_test
    context, texts_len, results_len = make_concordance(searchterm, search_type)

  File "/home/larapsodia/webapps/django/tunisiya2/corpus/views.py", line 91, in make_concordance
    p = re.compile(r'\b' + searchterm + r'__') # initial position in word_pos_lemma string

  File "/usr/local/lib/python2.6/re.py", line 190, in compile
    return _compile(pattern, flags)

  File "/usr/local/lib/python2.6/re.py", line 245, in _compile
    raise error, v # invalid expression

error: unbalanced parenthesis


<WSGIRequest
GET:<QueryDict: {}>,
POST:<QueryDict: {u'searchterm': [u'ha((.*(?!al))'], u'search_type': [u'regex'], u'csrfmiddlewaretoken': [u'c9a6cad4a0761580f5e351e9e534e028']}>,
COOKIES:{'__utma': '58037167.1544119768.1401037185.1401381302.1401384825.14',
 '__utmb': '58037167.10.10.1401384825',
 '__utmc': '58037167',
 '__utmz': '58037167.1401037185.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
 'csrftoken': 'c9a6cad4a0761580f5e351e9e534e028',
 'sessionid': '8d5b0b8730ccce0860b687b4c7ec1fdb'},
META:{'CONTENT_LENGTH': '109',
 'CONTENT_TYPE': 'application/x-www-form-urlencoded',
 'CSRF_COOKIE': 'c9a6cad4a0761580f5e351e9e534e028',
 'DOCUMENT_ROOT': '/usr/local/apache2/htdocs',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8',
 'HTTP_ACCEPT_ENCODING': 'gzip,deflate,sdch',
 'HTTP_ACCEPT_LANGUAGE': 'en-US,en;q=0.8,ar;q=0.6',
 'HTTP_CACHE_CONTROL': 'max-age=0',
 'HTTP_CONNECTION': 'close',
 'HTTP_COOKIE': 'sessionid=8d5b0b8730ccce0860b687b4c7ec1fdb; csrftoken=c9a6cad4a0761580f5e351e9e534e028; __utma=58037167.1544119768.1401037185.1401381302.1401384825.14; __utmb=58037167.10.10.1401384825; __utmc=58037167; __utmz=58037167.1401037185.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none)',
 'HTTP_FORWARDED_REQUEST_URI': '/search_test/',
 'HTTP_HOST': 'www.tunisiya.org',
 'HTTP_HTTPS': 'off',
 'HTTP_HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_ORIGIN': 'http://www.tunisiya.org',
 'HTTP_REFERER': 'http://www.tunisiya.org/search_test/',
 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/35.0.1916.114 Safari/537.36',
 'HTTP_X_FORWARDED_FOR': '68.9.41.110',
 'HTTP_X_FORWARDED_HOST': 'www.tunisiya.org',
 'HTTP_X_FORWARDED_PROTO': 'http',
 'HTTP_X_FORWARDED_SERVER': 'www.tunisiya.org',
 'HTTP_X_FORWARDED_SSL': 'off',
 'PATH_INFO': u'/search_test/',
 'PATH_TRANSLATED': '/home/larapsodia/webapps/django/tunisiya2.wsgi/search_test/',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '127.0.0.1',
 'REMOTE_PORT': '37086',
 'REQUEST_METHOD': 'POST',
 'REQUEST_URI': '/search_test/',
 'SCRIPT_FILENAME': '/home/larapsodia/webapps/django/tunisiya2.wsgi',
 'SCRIPT_NAME': u'',
 'SERVER_ADDR': '127.0.0.1',
 'SERVER_ADMIN': '[no address given]',
 'SERVER_NAME': 'www.tunisiya.org',
 'SERVER_PORT': '80',
 'SERVER_PROTOCOL': 'HTTP/1.0',
 'SERVER_SIGNATURE': '',
 'SERVER_SOFTWARE': 'Apache/2.2.15 (Unix) mod_wsgi/3.2 Python/2.6.8',
 'mod_wsgi.application_group': 'tunisiya2.com|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.handler_script': '',
 'mod_wsgi.input_chunked': '0',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '39877',
 'mod_wsgi.process_group': '',
 'mod_wsgi.request_handler': 'wsgi-script',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (3, 2),
 'wsgi.errors': <mod_wsgi.Log object at 0xd69b570>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0xa7efda0>,
 'wsgi.input': <mod_wsgi.Input object at 0xd69b598>,


 'wsgi.multiprocess': False,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'http',
 'wsgi.version': (1, 1)}>

3 个答案:

答案 0 :(得分:1)

make_concordance换成try - except;如果发生异常, 为用户呈现原始表单模板以及错误信息。

import re
try:
    context, texts_len, results_len = make_concordance(searchterm, search_type)
except re.error as e:
    form._errors['search_term'] = str(e)
    del form.cleaned_data['search_term']

    return render_to_response('corpus/search_test.html', 
         {'form': form}, context_instance=RequestContext(request))

更好的方法是制作一个custom cleaner,但似乎有点复杂,而且我没有Django。

答案 1 :(得分:0)

在@Sam的评论的基础上,以下是正则表达式无法编译时如何捕获特定错误:

import re
err_message = None
try:
    re.compile('(unbalanced')
except re.error as exc:
    err_message = 'Uhoh: {}'.format(exc)

print err_message

输出:

  

Uhoh:不平衡的括号

答案 2 :(得分:0)

我最终建立了一个定制清洁剂,正如Antti所说。这最终有效:

def clean(self):
    cleaned_data = self.cleaned_data
    searchterm = cleaned_data.get('searchterm')
    search_type = cleaned_data.get('search_type')
    if search_type == 'regex':
        try:
            re.search(searchterm, 'randomdatastring') #this is just to test if the regex is valid
        except re.error:
            raise forms.ValidationError("Invalid regular expression.")
    return cleaned_data