如何从非标签字符串中拆分标签?
假设:
public RequestStatus login(String userName, String password) {
...
}
之间的任何子字符串都是标签@ApiOperation(value = "To login", response = RequestStatus.class)
@ResponseBody
@RequestMapping(value = "/login", method = RequestMethod.POST)
public ResponseEntity<ReturnValue> login(@RequestBody() String info) {
Login login = new Login(info);
....
}
public class Login {
private String password;
private String userName;
public Login(String info) {
String[] values = info.split("&");
for (String value : values) {
String[] pair = value.split("=");
if (pair.length == 2) {
switch (pair[0]) {
case "password":
password = pair[1];
break;
case "userName":
userName = pair[1];
break;
}
}
}
}
}
带有其他没有<...>
的字符串是非标签给出:
<
预期输出:
>
我已经尝试过使用此正则表达式尝试捕获<div><div><div><div><div>acsc<div>abcd</div>
>acsc<div>abcd</div>
<div>abcd </div>
<div>abcd
abcd efg </div>
abcd efg</div>
<div> zxc>aa <asc>asca asca> acsa<>asca acasc>
as>aca>asc a<aca< <aca>asca>
<asvajvaolqwd> avaskmlv> avasv><avsva>asca
个组,但它仅捕获了一个标签实例:
[['<div>', '<div>', '<div>', '<div>', '<div>', 'acsc', '<div>', 'abcd', '</div>'],
['>acsc', '<div>', 'abcd', '</div>'],
['<div>', 'abcd', '</div>'],
['<div>', 'abcd'],
['abcd', 'efg', '</div>'],
['abcd', 'efg', '</div>'],
['<div>', 'zxc>aa', '<asc>', 'asca', 'asca>', 'acsa', '<>', 'asca', 'acasc>'],
['as>aca>asc', 'a<aca<', '<aca>', 'asca>'],
['<asvajvaolqwd>', 'avaskmlv>', 'avasv>', '<avsva>', 'asca']]
例如
<...> ... </...>
然后我尝试了
(<.*(?<=>))(.*)((?=<\/)[^>]*>)
我可以找出所有可能的标签位置:
>>> import re
>>> x = """
... <div> <div> <div> <div> <div> acsc <div> abcd </div>
... >acsc <div> abcd </div>
... <div> abcd </div>
... <div> abcd
... abcd efg </div>
... abcd efg </div>
... <div> zxc>aa <asc> asca asca> acsa <> asca acasc>
... as>aca>asc a<aca< <aca> asca>
... <asvajvaolqwd> avaskmlv> avasv> <avsva> asca"""
>>> [re.findall(r"(<.*(?<=>))(.*)((?=<\/)[^>]*>)", line) for line in x.split('\n')]
[[], [('<div> <div> <div> <div> <div> acsc <div>', ' abcd ', '</div>')], [('<div>', ' abcd ', '</div>')], [('<div>', ' abcd ', '</div>')], [], [], [], [], [], []]
仍然没有预期的输出,而且对于((?=<)[^>]*>)
来说,它很贪心,并把它当作>>> [re.findall(r"((?=<)[^>]*>)", line) for line in x.split('\n')]
[[], ['<div>', '<div>', '<div>', '<div>', '<div>', '<div>', '</div>'], ['<div>', '</div>'], ['<div>', '</div>'], ['<div>'], ['</div>'], ['</div>'], ['<div>', '<asc>', '<>'], ['<aca< <aca>'], ['<asvajvaolqwd>', '<avsva>']]
而不是一个。 如何使findall变得非贪婪?