如何匹配python列表项与正则表达式

时间:2018-05-06 22:49:44

标签: python regex

import re
def popular_words(text, words):
    """(str, array) -> dictionary
    returns dictionary  search words are the keys and values
    are the number of times when those words are occurring
    in a given text
    """
    word_dictionary = {}

    for word in words:     
        list = re.findall(word, text, re.IGNORECASE)
        word_dictionary.update({word : len(list) })

    return word_dictionary

popular_words('''
When I was One
I had just begun
When I was Two
I was nearly new
''', ['i', 'was', 'three', 'near']) 

如何忽略文本字符串中的“near”并且与“near”不匹配 我尝试使用\ bword \ b来定义单词边界,错误是:

  

“行继续符后的意外字符”

2 个答案:

答案 0 :(得分:0)

您可以通过定义单词边界来匹配整个单词,

public void LaunchApp(String packageName)
{
    AndroidJavaClass unityPlayer;
    AndroidJavaObject currentActivity;
    AndroidJavaObject packageManager;
    AndroidJavaObject launchIntent;

    unityPlayer = new AndroidJavaClass("com.unity3d.player.UnityPlayer");
    currentActivity = unityPlayer.GetStatic<AndroidJavaObject("currentActivity");
    packageManager = currentActivity.Call<AndroidJavaObject>("getPackageManager");
    launchIntent = packageManager.Call<AndroidJavaObject>("getLaunchIntentForPackage", packageName);
    currentActivity.Call("startActivity", launchIntent);
}

Python代码:

\b[a-zA-Z]+\b

答案 1 :(得分:0)

你绝对可以使用字符串格式和\ b。你得到的错误可能是因为你没有使用像这样的原始字符串(如果你使用反斜杠,总是使用带有re的原始字符串,它会让生活更轻松。):

import re
def popular_words(text, words):
    """(str, array) -> dictionary
    returns dictionary  search words are the keys and values
    are the number of times when those words are occurring
    in a given text
    """
    word_dictionary = {}

    for word in words:

            list = re.findall(r'\b{0}\b'.format(word), text, re.IGNORECASE)
            word_dictionary.update({word : len(list) })

    return word_dictionary

print(popular_words('''
When I was One
I had just begun
When I was Two
I was nearly new
''', ['i', 'was', 'three', 'near']))

输出:

{'i': 4, 'near': 0, 'was': 3, 'three': 0}

编辑:为了完整起见。这是您不必使用原始字符串的必要条件。你必须通过加倍来逃避反斜杠。

list = re.findall('\\b{0}\\b'.format(word), text, re.IGNORECASE)