假设我有一个字符串:
look_up_check('US_POPULATION','('POPULATION' = 3844829) and ('CITY' = 'Los Angeles')')
现在,我必须找到:'US_POPULATION' , 'POPULATION', 'CITY' , 'Los Angeles'
。
我尝试过使用基于堆栈的方法,但达不到标记。我可以使用正则表达式或任何其他方法吗?
答案 0 :(得分:0)
我将提供一个无正则表达式的解决方案。
字符串引号内的内容目前是:
US_POPULATION
(
= 3844829) and (
=
)
这不是你想要的。要获得所需的字符串,引号必须以这种方式排列:
look_up_check('US_POPULATION',('POPULATION' = 3844829) and ('CITY' = 'Los Angeles'))
解决方案:
public static List<String> findStuffInQuotes(String s) {
List<String> list = new ArrayList<>();
StringBuilder sb = new StringBuilder();
boolean insideQuotes = false;
for (int i = 0 ; i < s.length() ; i++) {
if (s.charAt(i) == '\'') {
insideQuotes = !insideQuotes;
if (!insideQuotes) {
list.add(sb.toString());
sb = new StringBuilder();
}
} else if (insideQuotes) {
sb.append(s.charAt(i));
}
}
list.add(sb.toString());
return list;
}
答案 1 :(得分:0)
您可以使用正则表达式:
String str = "look_up_check('US_POPULATION','('POPULATION' = 3844829) and ('CITY' = 'Los Angeles')')";
Pattern p = Pattern.compile("'[\\w\\s]+'");
Matcher m = p.matcher(str);
while (m.find()) {
System.out.println(m.group());
}
<强>输出强>
'US_POPULATION'
'POPULATION'
'CITY'
'Los Angeles'
答案 2 :(得分:0)
为了拥有比正则表达式和手写词法分析器更强大的功能,您可以使用类似Flex的工具来创建一个。我在JFlex + CUP(http://jflex.de/manual.html,http://www2.cs.tum.edu/projects/cup/install.php)中创建了一个简单的解析器,用于解析您提供的文本。
首先,您需要创建.flex文件,该文件将描述生成令牌的规则:
import java_cup.runtime.*;
import java.util.*;
%%
%unicode
%class LexicalAnalyzer
%line
%column
%cup
/*numbers*/
number = ([1-9][0-9]*| 0)([.][0-9]+ )?([eE]([+]|[-])?[0-9]+)?
digit = [0-9]
underscore = [_]
identifier = {identifier5} ( [.] {identifier5} )*
identifier2 = {letter} ({letter}|{digit}|{underscore})*
identifier3 = {digit} ({letter}|{digit}|{underscore})+
identifier4 = {underscore} ({letter}|{digit}|{underscore})+
identifier5 = {identifier2} | {identifier3} | {identifier4}
letter = {lowercase} | {uppercase}
lowercase = [a-z]
uppercase = [A-Z]
inputchar = [^\r\n]
/*Comments*/
lineterminator = \r | \n | \r\n
simplecomment = "//" {inputchar}* {lineterminator}
blockcomment = "/*" ( [^*]* | "*"+ [^/*] )* "*"+ "/"
%{
private void error(){
System.err.print("Sintax error on line " + (yyline+1));
System.err.println(". Unrecognizable token: \"" + yytext() + "\"");
//System.exit(1);
}
private Symbol processToken(int type, Object value) {
System.out.println("Type: " + type);
System.out.println("Value: " + value );
return new Symbol(type);
}
StringBuffer str = new StringBuffer();
%}
%state STRING
%state END
%%
<YYINITIAL> {
/* string literals */
['] {str.setLength(0);yybegin(STRING);}
"and" { return processToken( sym.AND , yytext()); }
/* number literals */
{number} { return processToken(sym.NUMBER , yytext()); }
/* identifiers */
{identifier} { return processToken( sym.IDENTIFIER , yytext()); }
"(" { return processToken( sym.OPENP , yytext()); }
")" { return processToken( sym.CLOSEP , yytext()); }
"=" { return processToken(sym.EQUALS , yytext()); }
"," { return processToken(sym.COMMA , yytext()); }
"+" { return processToken(sym.OP , yytext()); }
"-" { return processToken(sym.OP , yytext()); }
"*" { return processToken(sym.OP , yytext()); }
"/" { return processToken(sym.OP , yytext()); }
// whitespace and comments
{simplecomment} {/* Do nothing */}
{blockcomment} {}
" "|\t|\n| {lineterminator} {/* Do nothing */}
. {error();}
//. { /*error!*/ }
}
/* literais string */
<STRING> {
['] { yybegin(YYINITIAL); return processToken( sym.STRING , str.toString()); }
\\t { str.append('\t'); }
\\n { str.append('\n'); }
\\r { str.append('\r'); }
\\\" { str.append('\"'); }
\\\\ { str.append('\\'); }
\\['] { str.append('\''); }
\\[0-9][0-9][0-9]
{
String s = yytext().substring(1);
s = "" + ((char) Integer.parseInt(s));
str.append( s );
}
[^\n\r\'\\\t]+ { str.append( yytext() ); }
. { /* malformed string */}
}
<END>{
\n {}
. {}
}
然后,您需要使用JFlex.jar
编译此解析器规范java -jar JFlex.jar lexical.flex
它将创建一个名为“LexicalAnalyzer.java”的源文件,您可以根据自己的规范将字符串分解为标记。
public class Parser {
public static void main(String[] args) throws Exception {
String str = "look_up_check('US POPULATION', ( 'POPULATION' = 3844829 ) and ('CITY' = 'Los Angeles'))";
ByteArrayInputStream buff = new ByteArrayInputStream(str.getBytes());
LexicalAnalyzer l = new LexicalAnalyzer(buff);
Symbol s = l.next_token();
while(s.sym != sym.EOF){
s = l.next_token();
}
}
}
制作输出:
Type: 5
Value: look_up_check
Type: 3
Value: (
Type: 1
Value: US POPULATION
Type: 9
Value: ,
Type: 3
Value: (
Type: 1
Value: POPULATION
Type: 8
Value: =
Type: 7
Value: 3844829
Type: 4
Value: )
Type: 2
Value: and
Type: 3
Value: (
Type: 1
Value: CITY
Type: 8
Value: =
Type: 1
Value: Los Angeles
Type: 4
Value: )
Type: 4
Value: )
编辑: sym.java类
public class sym {
public static int STRING = 1;
public static int AND = 2;
public static int OPENP = 3;
public static int CLOSEP = 4;
public static int IDENTIFIER = 5;
public static int OP = 6;
public static int NUMBER = 7;
public static int EQUALS = 8;
public static int COMMA = 9;
public static int EOF = 10;
}