尝试删除python文本中的括号时出错

时间:2015-07-17 12:58:12

标签: python regex parentheses

我一直在研究一些代码,从其他文件中取出一堆直方图并将它们拼接在一起。为了确保图例正确显示,我一直在尝试获取这些原始直方图的标题,并删除了一些不再需要的信息。

我不需要的部分采用形式(质量= 200 GeV),我没有问题删除括号内的内容,不幸的是我为括号本身尝试的所有内容都没有效果,否定了用于删除文本或抛出错误的代码。

我尝试过使用过的建议; Remove parenthesis and text in a file using PythonHow can I remove text within parentheses with a regex?

我当前尝试给出的错误是

'str' object cannot be interpreted as an integer

这是代码的一部分:

histo_name = ''

# this is a list of things we do not want to show up in our legend keys
REMOVE_LIST = ["(A mass = 200 GeV)"]

# these two lines use the re module to remove things from a piece of text
# that are specified in the remove list
remove = '|'.join(REMOVE_LIST)
regex = re.compile(r'\b('+remove+r')\b')

# Creating the correct name for the stacked histogram
for histo in histos:

    if histo == histos[0]:

        # place_holder contains the edited string we want to set the
        # histogram title to
        place_holder = regex.sub('', str(histo.GetName()))
        histo_name += str(place_holder)
        histo.SetTitle(histo_name)

    else:

        place_holder = regex.sub(r'\(\w*\)', '', str(histo.GetName()))
        histo_name += ' + ' + str(place_holder)
        histo.SetTitle(histo_name)

if / else位只是因为我传入的第一个直方图没有堆叠所以我只是希望它保留它自己的名字,而其余的按顺序堆叠,因此'+'等,但我想我把它包括在内。

道歉,如果我做了一些非常明显错误的事情,我还在学习!

2 个答案:

答案 0 :(得分:1)

来自python docs - 要匹配文字'('或')',请使用\(或\),或将它们包含在字符类中: [(] [)]。

因此,在正则表达式中使用上述模式之一而不是普通括号。 e.g。REMOVE_LIST = ["\(A mass = 200 GeV\)"]

编辑:问题似乎是你在正则表达式中使用\ b - 根据上面链接的文档也匹配大括号。我看似工作的例子是,

import re

# Test input
myTestString = "someMess (A mass = 200 GeV) and other mess (remove me if you can)"
replaceWith = "HEY THERE FRIEND"

# What to remove
removeList = [r"\(A mass = 200 GeV\)", r"\(remove me if you can\)"]

# Build the regex
remove = r'(' + '|'.join(removeList) + r')'
regex = re.compile(remove)

# Try it!
out = regex.sub(replaceWith, myTestString)

# See if it worked
print(out)

答案 1 :(得分:0)

您面临的问题有两个

  1. 您将字符串加入正则表达式模式而不转义
  2. 您正在使用单词边界,但有些条目以非单词字母开头/结尾(因此,您永远不会将) r"\)\b"匹配。)
  3. 这解决了第一个问题,但不是第二个问题(它只找到More+[fun]+text):

    REMOVE_LIST = ["(A mass = 200 GeV)", "More+[fun]+text"]
    remove = '|'.join([re.escape(x) for x in REMOVE_LIST])
    ptrn = r'\b(?:'+remove+r')\b'
    print ptrn
    regex = re.compile(ptrn)
    print regex.findall("Now, (A mass = 200 GeV) and More+[fun]+text inside")
    

    您需要一种更智能的方式来创建模式。像这样:

    import re
    REMOVE_LIST = ["(A mass = 200 GeV)", "More+[fun]+text"]
    
    remove_with_boundaries = '|'.join([re.escape(x) for x in REMOVE_LIST if re.match(r'\w', x) and re.search(r'\w$', x)])
    remove_with_no_boundaries = '|'.join([re.escape(x) for x in REMOVE_LIST if not re.match(r'\w', x) and not re.search(r'\w$', x)])
    remove_with_right_boundaries = '|'.join([re.escape(x) for x in REMOVE_LIST if not re.match(r'\w', x) and re.search(r'\w$', x)])
    remove_with_left_boundaries = '|'.join([re.escape(x) for x in REMOVE_LIST if re.match(r'\w', x) and not re.search(r'\w$', x)])
    
    ptrn = ''
    if len(remove_with_boundaries) > 0:
        ptrn += r'\b(?:'+remove_with_boundaries+r')\b'
    if len(remove_with_left_boundaries) > 0:
        ptrn += r'|\b(?:' + remove_with_left_boundaries + r')'
    if len(remove_with_right_boundaries) > 0:
        ptrn += r'|(?:' + remove_with_right_boundaries + r')\b'
    if len(remove_with_no_boundaries) > 0:
        ptrn += r'|(?:' + remove_with_no_boundaries + r')'
    
    print ptrn
    regex = re.compile(ptrn)
    print regex.findall("Now, (A mass = 200 GeV) and More+[fun]+text inside")
    

    请参阅IDEONE demo

    对于两个["(A mass = 200 GeV)", "More+[fun]+text"]条目作为输入,生成正则表达式\b(?:More\+\[fun\]\+text)\b|(?:\(A\ mass\ \=\ 200\ GeV\)),输出为['(A mass = 200 GeV)', 'More+[fun]+text']