Java:在String中查找字符串文字

时间:2017-05-18 06:26:18

标签: java regex string stack

假设我有一个字符串:

look_up_check('US_POPULATION','('POPULATION' = 3844829) and ('CITY' = 'Los Angeles')')

现在,我必须找到:'US_POPULATION' , 'POPULATION', 'CITY' , 'Los Angeles'

我尝试过使用基于堆栈的方法,但达不到标记。我可以使用正则表达式或任何其他方法吗?

3 个答案:

答案 0 :(得分:0)

我将提供一个无正则表达式的解决方案。

字符串引号内的内容目前是:

US_POPULATION
(
 = 3844829) and (
 = 
)

这不是你想要的。要获得所需的字符串,引号必须以这种方式排列:

look_up_check('US_POPULATION',('POPULATION' = 3844829) and ('CITY' = 'Los Angeles'))

解决方案:

public static List<String> findStuffInQuotes(String s) {
    List<String> list = new ArrayList<>();
    StringBuilder sb = new StringBuilder();
    boolean insideQuotes = false;
    for (int i = 0 ; i < s.length() ; i++) {
        if (s.charAt(i) == '\'') {
            insideQuotes = !insideQuotes;
            if (!insideQuotes) {
                list.add(sb.toString());
                sb = new StringBuilder();
            }
        } else if (insideQuotes) {
            sb.append(s.charAt(i));
        }
    }
    list.add(sb.toString());
    return list;
}

答案 1 :(得分:0)

您可以使用正则表达式:

String str = "look_up_check('US_POPULATION','('POPULATION' = 3844829) and ('CITY' = 'Los Angeles')')";
Pattern p = Pattern.compile("'[\\w\\s]+'");
Matcher m = p.matcher(str);

while (m.find()) {
    System.out.println(m.group());
}

<强>输出

'US_POPULATION'
'POPULATION'
'CITY'
'Los Angeles'

Regex demo

答案 2 :(得分:0)

为了拥有比正则表达式和手写词法分析器更强大的功能,您可以使用类似Flex的工具来创建一个。我在JFlex + CUP(http://jflex.de/manual.htmlhttp://www2.cs.tum.edu/projects/cup/install.php)中创建了一个简单的解析器,用于解析您提供的文本。

首先,您需要创建.flex文件,该文件将描述生成令牌的规则:

import java_cup.runtime.*;
import java.util.*;

%%
%unicode
%class LexicalAnalyzer
%line
%column
%cup

/*numbers*/
number      = ([1-9][0-9]*| 0)([.][0-9]+ )?([eE]([+]|[-])?[0-9]+)?
digit       = [0-9]
underscore = [_]


identifier = {identifier5} ( [.] {identifier5} )*

identifier2 = {letter} ({letter}|{digit}|{underscore})*
identifier3 = {digit} ({letter}|{digit}|{underscore})+
identifier4 = {underscore} ({letter}|{digit}|{underscore})+
identifier5     = {identifier2} | {identifier3} | {identifier4}
letter      = {lowercase} | {uppercase}
lowercase   = [a-z]
uppercase   = [A-Z]
inputchar   = [^\r\n]

/*Comments*/
lineterminator  = \r | \n | \r\n
simplecomment   = "//" {inputchar}* {lineterminator}
blockcomment    = "/*" ( [^*]* | "*"+ [^/*] )* "*"+ "/" 


%{

private void error(){
    System.err.print("Sintax error on line " + (yyline+1));
    System.err.println(". Unrecognizable token: \"" + yytext() + "\"");
    //System.exit(1);
}

private Symbol processToken(int type, Object value) {
    System.out.println("Type: " + type);
        System.out.println("Value: " + value );
        return new Symbol(type);
}

StringBuffer str = new StringBuffer();

%}

%state STRING
%state END

%%

<YYINITIAL> {
    /* string literals */

    [']        {str.setLength(0);yybegin(STRING);}

        "and" { return processToken( sym.AND , yytext()); }



    /* number literals */
    {number} { return processToken(sym.NUMBER , yytext()); }

    /* identifiers */


    {identifier} { return processToken( sym.IDENTIFIER , yytext()); }


    "("    { return processToken( sym.OPENP , yytext()); }
    ")"    { return processToken( sym.CLOSEP , yytext()); }
    "="  { return processToken(sym.EQUALS , yytext());  }
    ","  { return processToken(sym.COMMA , yytext());  }
    "+"  { return processToken(sym.OP , yytext());  }
    "-"  { return processToken(sym.OP , yytext());  }
    "*"  { return processToken(sym.OP , yytext());  }
    "/"  { return processToken(sym.OP , yytext());    }


    // whitespace and comments
    {simplecomment} {/* Do nothing */}
    {blockcomment}  {} 
    " "|\t|\n|  {lineterminator}    {/* Do nothing */}
        .               {error();}
    //.             { /*error!*/ } 
}



/* literais string */
<STRING> {
    [']              { yybegin(YYINITIAL); return processToken( sym.STRING , str.toString()); } 
    \\t             { str.append('\t'); }
    \\n             { str.append('\n'); }
    \\r             { str.append('\r'); }
    \\\"            { str.append('\"'); }
    \\\\            { str.append('\\'); }
    \\[']           { str.append('\''); }
    \\[0-9][0-9][0-9] 
    {
        String s = yytext().substring(1);
        s = "" + ((char) Integer.parseInt(s));
        str.append( s );
    }
    [^\n\r\'\\\t]+    { str.append( yytext() ); }
    .               { /* malformed string */}
}

<END>{
    \n  {}
    .   {}
}

然后,您需要使用JFlex.jar

编译此解析器规范
java -jar JFlex.jar lexical.flex

它将创建一个名为“LexicalAnalyzer.java”的源文件,您可以根据自己的规范将字符串分解为标记。

public class Parser {


    public static void main(String[] args) throws Exception {

    String str = "look_up_check('US POPULATION', ( 'POPULATION' = 3844829 ) and ('CITY' = 'Los Angeles'))";

    ByteArrayInputStream buff = new ByteArrayInputStream(str.getBytes());

    LexicalAnalyzer l = new LexicalAnalyzer(buff);

    Symbol s = l.next_token();

    while(s.sym != sym.EOF){
        s = l.next_token();
    }

    }

}

制作输出:

Type: 5
Value: look_up_check
Type: 3
Value: (
Type: 1
Value: US POPULATION
Type: 9
Value: ,
Type: 3
Value: (
Type: 1
Value: POPULATION
Type: 8
Value: =
Type: 7
Value: 3844829
Type: 4
Value: )
Type: 2
Value: and
Type: 3
Value: (
Type: 1
Value: CITY
Type: 8
Value: =
Type: 1
Value: Los Angeles
Type: 4
Value: )
Type: 4
Value: )

编辑: sym.java类

public class sym {
    public static int STRING = 1;
    public static int AND = 2;
    public static int OPENP = 3;
    public static int CLOSEP = 4;
    public static int IDENTIFIER = 5;
    public static int OP = 6;
    public static int NUMBER = 7;
    public static int EQUALS = 8;
    public static int COMMA = 9;
    public static int EOF = 10;
}