Question

我试图将一个字符串拆分成一个列表。它几乎可以工作，但由于某种原因，它会在开头和结尾产生一个额外的空白列表元素。

line = "A12B1234123456  Misc text"
re.split('^([A-H])(\d{2})?([A-Z])?(\d{4})?(\d{6})?\t(.*)$', line)
['', 'A', '12', 'B', '1234', '123456', 'Misc text', '']

我期待['A', '12', 'B', '1234', '123456', 'Misc text'] 为什么会发生这种情况，我该如何防止呢？

Answer 1

您的正则表达式是正确的，但请勿使用re.split打印您的匹配项。

使用re.findall打印所有匹配项（已捕获的组）：

>>> print re.findall(r'([A-H])(\d{2})?([A-Z])?(\d{4})?(\d{6})?\t(.*)$', line)[0]
('A', '12', 'B', '1234', '123456', 'Misc text')

Answer 2

因为您正在使用split，基本上，它使用正则表达式作为分隔符将部分字符串分开。

我认为您正在寻找的是匹配正则表达式中的选定组：

String string = this.template;
        Pattern pattern = Pattern.compile("<.*?>");
        Matcher matcher = pattern.matcher(string);

        List<String> listMatches = new ArrayList<String>();

        while(matcher.find()) {
            listMatches.add(matcher.group());
        }
        // System.out.println(listMatches.size());
        int indexNumber = 1;
         for(String s : listMatches) {
             System.out.println(Integer.toString(indexNumber) + ". " + s);
             indexNumber++;
         }

我认为这是你期待的清单。

Python正则表达式分裂困境

2 个答案: