正则表达式从非标签字符串中拆分标签

时间:2018-12-11 01:39:56

标签: python regex xml

如何从非标签字符串中拆分标签?

假设:

  • public RequestStatus login(String userName, String password) { ... } 之间的任何子字符串都是标签
  • 单个@ApiOperation(value = "To login", response = RequestStatus.class) @ResponseBody @RequestMapping(value = "/login", method = RequestMethod.POST) public ResponseEntity<ReturnValue> login(@RequestBody() String info) { Login login = new Login(info); .... } public class Login { private String password; private String userName; public Login(String info) { String[] values = info.split("&"); for (String value : values) { String[] pair = value.split("="); if (pair.length == 2) { switch (pair[0]) { case "password": password = pair[1]; break; case "userName": userName = pair[1]; break; } } } } } 带有其他没有<...>的字符串是非标签

给出:

<

预期输出:

>

尝试1

我已经尝试过使用此正则表达式尝试捕获<div><div><div><div><div>acsc<div>abcd</div> >acsc<div>abcd</div> <div>abcd </div> <div>abcd abcd efg </div> abcd efg</div> <div> zxc>aa <asc>asca asca> acsa<>asca acasc> as>aca>asc a<aca< <aca>asca> <asvajvaolqwd> avaskmlv> avasv><avsva>asca 个组,但它仅捕获了一个标签实例:

[['<div>', '<div>', '<div>', '<div>', '<div>', 'acsc', '<div>', 'abcd', '</div>'], 
['>acsc', '<div>', 'abcd', '</div>'], 
['<div>', 'abcd', '</div>'], 
['<div>', 'abcd'], 
['abcd', 'efg', '</div>'], 
['abcd', 'efg', '</div>'], 
['<div>', 'zxc>aa', '<asc>', 'asca', 'asca>', 'acsa', '<>', 'asca', 'acasc>'], 
['as>aca>asc', 'a<aca<', '<aca>', 'asca>'], 
['<asvajvaolqwd>', 'avaskmlv>', 'avasv>', '<avsva>', 'asca']]

例如

<...> ... </...>

尝试2

然后我尝试了

(<.*(?<=>))(.*)((?=<\/)[^>]*>)

我可以找出所有可能的标签位置:

>>> import re

>>> x = """
... <div> <div> <div> <div> <div> acsc <div> abcd </div>
... >acsc <div> abcd </div>
... <div> abcd </div>
... <div> abcd 
... abcd efg </div>
... abcd efg </div>
... <div> zxc>aa <asc> asca asca> acsa <> asca acasc>
... as>aca>asc a<aca< <aca> asca>
... <asvajvaolqwd> avaskmlv> avasv> <avsva> asca"""


>>> [re.findall(r"(<.*(?<=>))(.*)((?=<\/)[^>]*>)", line) for line in x.split('\n')]
[[], [('<div> <div> <div> <div> <div> acsc <div>', ' abcd ', '</div>')], [('<div>', ' abcd ', '</div>')], [('<div>', ' abcd ', '</div>')], [], [], [], [], [], []]

仍然没有预期的输出,而且对于((?=<)[^>]*>) 来说,它很贪心,并把它当作>>> [re.findall(r"((?=<)[^>]*>)", line) for line in x.split('\n')] [[], ['<div>', '<div>', '<div>', '<div>', '<div>', '<div>', '</div>'], ['<div>', '</div>'], ['<div>', '</div>'], ['<div>'], ['</div>'], ['</div>'], ['<div>', '<asc>', '<>'], ['<aca< <aca>'], ['<asvajvaolqwd>', '<avsva>']] 而不是一个。 如何使findall变得非贪婪?

0 个答案:

没有答案