我试图使用正则表达式来获取字符串在括号内的位置。
例如,我想获得“家得宝”的位置;
sent = "Sales to two of the segment's customers, The Home Depot and Lowe's Home Improvement Warehouse, accounted for greater than 10% of the Corporation's consolidated sales for 2004, 2003, and 2002."
regex_ = re.compile("Sales to two of the segment's customers, The (Home Depot)
但是
regex_.search(sent).span()
返回(0, 55)
而不是(45, 55)
由于可能发送了多个“家得宝”,所以我无法使用re.search('Home Depot', sent).span()
,这可能无法返回我要寻找的家得宝的确切位置。
答案 0 :(得分:2)
如果要获取括号中文本的位置,则需要指定要与第一组匹配作为span()
的参数:
sent = "Sales to two of the segment's customers, The Home Depot and Lowe's Home Improvement Warehouse, accounted for greater than 10% of the Corporation's consolidated sales for 2004, 2003, and 2002."
regex_ = re.compile("Sales to two of the segment's customers, The (Home Depot)
regex_.search(sent).span(1)
请参阅match objects and span
上的python文档。
答案 1 :(得分:1)
使用积极的眼光:
sent = "Sales to two of the segment's customers, The Home Depot and Lowe's Home Improvement Warehouse, accounted for greater than 10% of the Corporation's consolidated sales for 2004, 2003, and 2002."
regex_ = re.compile(r"(?<=Sales to two of the segment's customers, The )Home Depot")
print(regex_.search(sent).span())
输出:
(45, 55)
答案 2 :(得分:1)
您的正则表达式正确。但是,您需要的是整个比赛的位置,而不是子比赛的位置。要获取第一个子匹配项的位置,请使用span(1)
>>> regex_ = re.compile("Sales to two of the segment's customers, The (Home Depot)")
>>> regex_.search(sent).span(1)
(45, 55)