查找并替换正则表达式而不是完整字符串

时间:2014-06-20 14:43:09

标签: python regex json

我已经加载了一个“正则表达式”字典:从json解析的“图片”对。

这些值旨在匹配消息字符串中的正则表达式,并将其替换为图片,以便在显示HTML文本的Flash插件中显示。

例如输入:

Hello MVGame everyone.

会回来:

Hello <img src='http://static-cdn.jtvnw.net/jtv_user_pictures/chansub-global-emoticon-1a1a8bb5cdf6efb9-24x32.png' height = '32' width = '24'> everyone.

然而:

如果我输入,

Hello :) everyone.

它不会解析:),因为它被编码为正则表达式"\\:-?\\)",而不仅仅是字符串匹配。

如何将正则表达式解析为匹配参数?

这是我的测试代码:

# regular expression test   
import urllib
import json # for loading json's for emoticons
import urllib.request # more for loadings jsons from urls
import re # allows pattern filtering for emoticons

def loademotes():
    #Create emoteicon dictionary
    try:
        print ("Trying to load emoteicons from twitch")
        response = urllib.request.urlopen('https://api.twitch.tv/kraken/chat/emoticons').read()
        mydata = json.loads(response.decode('utf-8'))

        for idx,item in enumerate(mydata['emoticons']):
            regex = item['regex']
            url = "<img src='" + item['images'][0]['url'] + "'" + " height = '" + str(item['images'][0]['height']) + "'" + " width = '" + str(item['images'][0]['width']) +  "' >"
            emoticonDictionary[regex] = url
        print ("All emoteicons loaded")

    except IOError as e:
        print ("I/O error({0}) : {1}".format(e.errno, e.strerror))
        print ("Cannot load emoteicons.")

emoticonDictionary = {} # create emoticon dictionary indexed by words returns url in html image tags

loademotes()

while 1:
    myString = input ("Here you type something : ")

    pattern = re.compile(r'\b(' + '|'.join(emoticonDictionary.keys()) + r')\b')
    results = pattern.sub(lambda x: emoticonDictionary[x.group()], myString)
    print (results)

1 个答案:

答案 0 :(得分:-1)

我认为你可以确保正则表达式中的每个句法字符都被字符类包围,然后再将其提供给re。喜欢写一些需要的东西:)并使它成为[:] [)]