python search and replace with spaces, brackets and underscores

时间:2018-01-15 18:16:44

标签: python replace

I am trying to search and replace all occurrences of __field(unsigned int, a_b) with ctf_integer(unsigned int, a_b, a_b).

Here the text a_b can change with different values.

I tried using regex like this:

m = re.search(r"__field\(unsigned int(.*)\)", string)

But in the result I get two groups and I am unable to understand why. I want to include the possibility to have spaces in it, so I used the wild-card detector .*. Is this a correct way of doing search and replace?

I also tried with \w+, but it does not accomodate for spaces.

The following also does not work if there are no spaces before comma:

m = re.search(r"__field\(unsigned int(\s+),(.*)\)", string)

questions:

  • why does \s+ option cannot detect zero spaces? any alternatives?
  • the search returns two groups instead of one, the first spans entire string while the second is after the comma. why is it so?

I can use re.sub as below:

re.sub(r"__field\(unsigned int(\s*),(.*)\)", r"ctf_integer(unsigned int, \2, \2)", string)

However this has a problem if there are multiple closing brackets. For example it doesn't work if the input is A(__field(unsigned int, a_b, a_b)), like so:

string = "A(__field(unsigned int, a_b, a_b))"
re.sub(r"__field\(unsigned int(\s*),(.*)\)", r"ctf_integer(unsigned int, \2, \2)", string)

# Outputs 'A(ctf_integer(unsigned int,  a_b, a_b),  a_b, a_b))'

P.S.: It is being used to convert tracepoints from one format to another.

1 个答案:

答案 0 :(得分:2)

您应该查找除.*以外的所有内容()),[^)]*。看起来像这样:

import re
string = "A(__field(unsigned int, a_b, a_b))"
re.sub(r"__field\(unsigned int,\s*([^)]*)\)", r"ctf_integer(unsigned int, \1)", string)

输出:

A(ctf_integer(unsigned int, a_b, a_b))

在上面的代码中,我还删除了捕获空格(将(\s*)更改为\s*),只将一个组编号为\1。此外,我将空格检测移到逗号之后,因为它可能应该是它。