Python正则表达式匹配文本后的数字

时间:2019-06-11 03:40:01

标签: python regex

我正在匹配通常采用以下格式的电影标题

private void Form1_Load(object sender, EventArgs e)
{
    listView1.MouseUp += new MouseEventHandler(listView1_MouseClick);

}

private void listView1_MouseClick(object sender, MouseEventArgs e)
{
    string id = "xxx";//extra value

    if (e.Button == MouseButtons.Right)
    {
        if (listView1.FocusedItem != null && listView1.FocusedItem.Bounds.Contains(e.Location) == true)
        {
            ContextMenu m = new ContextMenu();
            MenuItem cashMenuItem = new MenuItem("編輯");
            cashMenuItem.Click += delegate (object sender2, EventArgs e2) {
                ActionClick(sender, e, id);
            };// your action here 
            m.MenuItems.Add(cashMenuItem);

            MenuItem cashMenuItem2 = new MenuItem("-");
            m.MenuItems.Add(cashMenuItem2);

            MenuItem delMenuItem = new MenuItem("刪除");
            delMenuItem.Click += delegate (object sender2, EventArgs e2) {
                DelectAction(sender, e, id);
            };// your action here
            m.MenuItems.Add(delMenuItem);

            m.Show(listView1, new Point(e.X, e.Y));

        }
    }
}

private void DelectAction(object sender, MouseEventArgs e, string id)
{
    ListView ListViewControl = sender as ListView;
    foreach (ListViewItem eachItem in ListViewControl.SelectedItems)
    {
        // you can use this idea to get the ListView header's name is 'Id' before delete
        Console.WriteLine(GetTextByHeaderAndIndex(ListViewControl, "Id", eachItem.Index) );
        ListViewControl.Items.Remove(eachItem);
    }
}

private void ActionClick(object sender, MouseEventArgs e, string id)
{
    //id is extra value when you need or delete it
    ListView ListViewControl = sender as ListView;
    foreach (ListViewItem tmpLstView in ListViewControl.SelectedItems)
    {
        Console.WriteLine(tmpLstView.Text);
    }

}

public static string GetTextByHeaderAndIndex(ListView listViewControl, string headerName, int index)
{
    int headerIndex = -1;
    foreach (ColumnHeader header in listViewControl.Columns)
    {
        if (header.Name == headerName)
        {
            headerIndex = header.Index;
            break;
        }
    }
    if (headerIndex > -1)
    {
        return listViewControl.Items[index].SubItems[headerIndex].Text;
    }
    return null;
}

我的正则表达式是

[BLA VLA] The Matrix 1999 bla bla [bla bla]

这在大多数情况下都能正常工作,但对于像这样的电影却不起作用

match = re.match("\[?.*?\](.*?)([0-9]{4})(.*)\[?.*\]?", title)

我该如何解决

4 个答案:

答案 0 :(得分:2)

如果我们将使用与问题中列出的模式相同的大写和小写模式,则将从一个简单的表达式开始,例如:

([A-Z][a-z]+\s)+

Demo

测试

# coding=utf8
# the above tag defines encoding for this document and is for Python 2.x compatibility

import re

regex = r"([A-Z][a-z]+\s)+"

test_str = ("[bla bla] 1990 The Bronx Warriors 1982\n"
    "[ bl bla] 2012 2009 [ bla bla ]\n"
    "[BLA VLA] The Matrix 1999 bla bla [bla bla]\n")

matches = re.finditer(regex, test_str, re.MULTILINE)

for matchNum, match in enumerate(matches, start=1):

    print ("Match {matchNum} was found at {start}-{end}: {match}".format(matchNum = matchNum, start = match.start(), end = match.end(), match = match.group()))

    for groupNum in range(0, len(match.groups())):
        groupNum = groupNum + 1

        print ("Group {groupNum} found at {start}-{end}: {group}".format(groupNum = groupNum, start = match.start(groupNum), end = match.end(groupNum), group = match.group(groupNum)))

# Note: for Python 2.7 compatibility, use ur"" to prefix the regex and u"" to prefix the test string and substitution.

RegEx

如果不需要此表达式或您希望对其进行修改,请访问regex101.com

RegEx电路

jex.im可视化正则表达式:

enter image description here

答案 1 :(得分:1)

对于您的示例数据,一种选择可能是使用2个捕获组:

\[[^\]]+\] (.+?) (\d{4})

说明

  • \[[^\]]+\]用方括号匹配部分
  • (.+?)在第1组中捕获,匹配一个空格,是任何非char和贪婪空格的1倍以上
  • (\d{4})在第2组中捕获匹配的4位数字

Regex demo

答案 2 :(得分:0)

尝试一下

re.match( r"\[.*?\]\s([\w\s]+)", title).groups()[0].strip()

代码

进一步,请考虑在函数中重用代码。这是等效的代码:

import re


def get_title(s):
    """Return the title from a string."""
    pattern = r"\[.*?\]\s([\w\s]+)"
    p = re.compile(pattern)
    m = p.match(s)
    g = m.groups()
    return g[0].strip()

演示

get_title("[BLA VLA] The Matrix 1999 bla bla [bla bla]")
# 'The Matrix 1999 bla bla'

get_title("[bla bla] 1990 The Bronx Warriors 1982")
# '1990 The Bronx Warriors 1982'

get_title("[ bl bla] 2012 2009 [ bla bla ]")
# '2012 2009'

详细信息

请参见模式here

  • \[.*?\]\s:超出了方括号和空格
  • ([\w\s]+):捕获可选的字母数字和空格

答案 3 :(得分:0)

movies = '''[bla bla] 1990 The Bronx Warriors 1982
[ bl bla] 2012 2009 [ bla bla ]
[ bl bla] Normal movie title 2009 [ bla bla ]'''

import re

for movie, year in re.findall(r']\s+(.*)\s+(\d{4}).*?$', movies, flags=re.MULTILINE):
    print('Movie title: [{}] Movie year: [{}]'.format(movie, year))

打印:

Movie title: [1990 The Bronx Warriors] Movie year: [1982]
Movie title: [2012] Movie year: [2009]
Movie title: [Normal movie title] Movie year: [2009]