如何使用正则表达式有效地向后搜索?

时间:2010-03-01 11:01:50

标签: java regex

我正在搜索带有正则表达式的字符串数组,如下所示:

for (int j = line; j < lines.length; j++) {  
    if (lines[j] == null || lines[j].isEmpty()) {
        continue;
    }
    matcher = pattern.matcher(lines[j]);
    if (matcher.find(offset)) {
        offset = matcher.end();
        line = j;
        System.out.println("found \""+matcher.group()+"\" at line "+line+" ["+matcher.start()+","+offset+"]");
        return true;
    }
    offset = 0;
}
return false;

请注意,在上面的实施中,我保存lineoffset以进行连续搜索。

无论如何,现在我想从[line,offset] 向后搜索

我的问题:有没有办法有效地向后搜索正则表达式?如果没有,还有什么可以替代?

澄清: 向后我的意思是找到上一场比赛。
例如,假设我在

中搜索“dana”
"dana nama? dana kama! lama dana kama?" 

并进入第二场比赛。如果我再次matcher.find(),我会搜索前进并获得第3场比赛。但是我想向后搜索 并进入第一场比赛 然后上面的代码应输出如下内容:

found "dana" at line 0 [0,3] // fwd
found "dana" at line 0 [11,14] // fwd
found "dana" at line 0 [0,3] // bwd

5 个答案:

答案 0 :(得分:8)

Java的正则表达式引擎无法向后搜索。事实上,我所知道的唯一可以做到这一点的正则表达式引擎是.NET中的那个。

而不是向后搜索,迭代循环中的所有匹配(向前搜索)。如果匹配位于您想要的位置之前,请记住它。如果匹配位于所需位置之后,请退出循环。在伪代码中(我的Java有点生疏):

storedmatch = ""
while matcher.find {
  if matcher.end < offset {
    storedmatch = matcher.group()
  } else {
    return storedmatch
  }
}

答案 1 :(得分:5)

以下课程向前和向前搜索(当然)。

我在一个应用程序中使用了这个类,用户可以用长文本搜索字符串(比如Web浏览器中的搜索功能)。因此,它经过测试,适用于实际使用案例。

它使用的方法类似于Jan Goyvaerts描述的方法。它在开始位置之前选择一个文本块并向前搜索,如果有则返回最后一个匹配。如果没有匹配,如果在块之前选择一个新的文本块并以相同的方式搜索它。

像这样使用:

Search s = new Search("Big long text here to be searched [...]");
s.setPattern("some regexp");

// search backwards or forward as many times as you like,
// the class keeps track where the last match was
MatchResult where = s.searchBackward();
where = s.searchBackward(); // next match
where = s.searchBackward(); // next match

//or search forward
where = s.searchForward();
where = s.searchForward();

上课:

import java.util.regex.MatchResult;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

/*
 * Search regular expressions or simple text forward and backward in a CharSequence
 *
 * 
 * To simulate the backward search (that Java class doesn't have) the input data
 * is divided into chunks and each chunk is searched from last to first until a 
 * match is found (inter-chunk matches are returned from last to first too).
 *
 * The search can fail if the pattern/match you look for is longer than the chunk
 * size, but you can set the chunk size to a sensible size depending on the specific
 * application.
 *
 * Also, because the match could span between two adjacent chunks, the chunks are
 * partially overlapping. Again, this overlapping size should be set to a sensible
 * size.
 *
 * A typical application where the user search for some words in a document will
 * work perfectly fine with default values. The matches are expected to be between
 * 10-15 chars, so any chunk size and overlapping size bigger than this expected 
 * length will be fine.
 *
 * */
public class Search {

    private int BACKWARD_BLOCK_SIZE = 200;
    private int BACKWARD_OVERLAPPING = 20;

    private Matcher myFwdMatcher;
    private Matcher myBkwMatcher;
    private String  mySearchPattern;
    private int myCurrOffset;
    private boolean myRegexp;
    private CharSequence mySearchData;

    public Search(CharSequence searchData) {
        mySearchData = searchData;
        mySearchPattern = "";
        myCurrOffset = 0;
        myRegexp = true;
        clear();
    }

    public void clear() {
        myFwdMatcher = null;
        myBkwMatcher = null;
    }

    public String getPattern() {
        return mySearchPattern;
    }

    public void setPattern(String toSearch) {
        if ( !mySearchPattern.equals(toSearch) ) {
            mySearchPattern = toSearch;
            clear();
        }
    }

    public CharSequence getText() {
        return mySearchData;
    }

    public void setText(CharSequence searchData) {
        mySearchData = searchData;
        clear();
    }

    public void setSearchOffset(int startOffset) {
        if (myCurrOffset != startOffset) {
            myCurrOffset = startOffset;
            clear();
        }
    }

    public boolean isRegexp() {
        return myRegexp;
    }

    public void setRegexp(boolean regexp) {
        if (myRegexp != regexp) {
            myRegexp = regexp;
            clear();
        }
    }

    public MatchResult searchForward() {

        if (mySearchData != null) {

            boolean found;

            if (myFwdMatcher == null)
            {
                // if it's a new search, start from beginning
                String searchPattern = myRegexp ? mySearchPattern : Pattern.quote(mySearchPattern);
                myFwdMatcher = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE).matcher(mySearchData);
                try {
                    found = myFwdMatcher.find(myCurrOffset);
                } catch (IndexOutOfBoundsException e) {
                    found = false;
                }
            }
            else
            {
                // continue searching
                found = myFwdMatcher.hitEnd() ? false : myFwdMatcher.find();
            }

            if (found) {
                MatchResult result = myFwdMatcher.toMatchResult();
                return onMatchResult(result);
            }
        }
        return onMatchResult(null);
    }

    public MatchResult searchBackward() {

        if (mySearchData != null) {

            myFwdMatcher = null;

            if (myBkwMatcher == null)
            {
                // if it's a new search, create a new matcher
                String searchPattern = myRegexp ? mySearchPattern : Pattern.quote(mySearchPattern);
                myBkwMatcher = Pattern.compile(searchPattern, Pattern.CASE_INSENSITIVE).matcher(mySearchData);
            }

            MatchResult result = null;
            boolean startOfInput = false;
            int start = myCurrOffset;
            int end = start;

            while (result == null && !startOfInput)
            {
                start -= BACKWARD_BLOCK_SIZE;
                if (start < 0) {
                    start = 0;
                    startOfInput = true;
                }
                try {
                    myBkwMatcher.region(start, end);
                } catch (IndexOutOfBoundsException e) {
                    break;
                }
                while ( myBkwMatcher.find() ) {
                    result = myBkwMatcher.toMatchResult();
                }
                end = start + BACKWARD_OVERLAPPING; // depending on the size of the pattern this could not be enough
                                                    //but how can you know the size of a regexp match beforehand?
            }

            return onMatchResult(result);
        }
        return onMatchResult(null);
    }

    private MatchResult onMatchResult(MatchResult result) {
        if (result != null) {
            myCurrOffset = result.start();    
        }
        return result;
    }
}

如果你想在这里测试这个类是一个用例:

enter image description here

import java.awt.*;
import java.awt.event.*;
import javax.swing.*;
import javax.swing.event.*;
import java.util.regex.MatchResult;
import javax.swing.text.DefaultHighlighter;
import javax.swing.text.BadLocationException;

public class SearchTest extends JPanel implements ActionListener {

    protected JScrollPane scrollPane;
    protected JTextArea textArea;
    protected boolean docChanged = true;
    protected Search searcher;

    public SearchTest() {
        super(new BorderLayout());

        searcher = new Search("");

        JButton backButton = new JButton("Search backward");
        JButton fwdButton  = new JButton("Search forward");

        JPanel buttonPanel = new JPanel(new BorderLayout());
        buttonPanel.add(fwdButton, BorderLayout.EAST);
        buttonPanel.add(backButton, BorderLayout.WEST); 

        textArea = new JTextArea("Big long text here to be searched...", 20, 40);
        textArea.setEditable(true);
        scrollPane = new JScrollPane(textArea);

        final JTextField textField = new JTextField(40);

        //Add Components to this panel.
        add(buttonPanel, BorderLayout.NORTH);
        add(scrollPane, BorderLayout.CENTER);
        add(textField, BorderLayout.SOUTH);

        //Add actions
        backButton.setActionCommand("back");
        fwdButton.setActionCommand("fwd");
        backButton.addActionListener(this);
        fwdButton.addActionListener(this);

        textField.addActionListener( new ActionListener() {
            public void actionPerformed(ActionEvent e) {
                final String pattern = textField.getText();
                searcher.setPattern(pattern);
            }
        } );

        textArea.getDocument().addDocumentListener( new DocumentListener() { 
            public void insertUpdate(DocumentEvent e) { docChanged = true; }
            public void removeUpdate(DocumentEvent e) { docChanged = true; }
            public void changedUpdate(DocumentEvent e) { docChanged = true; }
        });
    }

    public void actionPerformed(ActionEvent e)  {

        if ( docChanged ) {
            final String newDocument = textArea.getText();
            searcher.setText(newDocument);
            docChanged = false;
        }

        MatchResult where = null;

        if ("back".equals(e.getActionCommand())) {
            where = searcher.searchBackward();
        } else if ("fwd".equals(e.getActionCommand())) {
            where = searcher.searchForward();
        }

        textArea.getHighlighter().removeAllHighlights();

        if (where != null) {
            final int start = where.start();
            final int end   = where.end();
            // highligh result and scroll
            try {
            textArea.getHighlighter().addHighlight(start, end, new DefaultHighlighter.DefaultHighlightPainter(Color.yellow));
            } catch (BadLocationException excp) {}
            textArea.scrollRectToVisible(new Rectangle(0, 0, scrollPane.getViewport().getWidth(), scrollPane.getViewport().getHeight()));
            SwingUtilities.invokeLater(new Runnable() {
                    @Override
                    public void run() { textArea.setCaretPosition(start); }
            });
        } else if (where == null) {
            // no match, so let's wrap around
            if ("back".equals(e.getActionCommand())) {
                searcher.setSearchOffset( searcher.getText().length() -1 );
            } else if ("fwd".equals(e.getActionCommand())) {
                searcher.setSearchOffset(0);
            }
        }
    }

    private static void createAndShowGUI() {
        //Create and set up the window.
        JFrame frame = new JFrame("SearchTest");
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);

        //Add contents to the window.
        frame.add(new SearchTest());

        //Display the window.
        frame.pack();
        frame.setVisible(true);
    }

    public static void main(String[] args) {
        //Schedule a job for the event dispatch thread:
        //creating and showing this application's GUI.
        javax.swing.SwingUtilities.invokeLater(new Runnable() {
            public void run() {
                createAndShowGUI();
            }
        });
    }
}

答案 2 :(得分:1)

我使用以下简单类在java中向后搜索

public class ReverseMatcher {
   private final Matcher _matcher;
   private final Stack<MatchResult> _results = new Stack<>();

   public ReverseMatcher(Matcher matcher){
       _matcher = matcher;
   }

   public boolean find(){
       return find(_matcher.regionEnd());
   }

   public boolean find(int start){
       if (_results.size() > 0){
           _results.pop();
           return _results.size() > 0;
       }
       boolean res = false;
       while (_matcher.find()){            
           if (_matcher.end() > start)
               break;
           res = true;
           _results.push(_matcher.toMatchResult());
       }
       return res;
   }

   public String group(int group){
       return _results.peek().group(group);               
   }

   public String group(){
       return _results.peek().group();               
   }

   public int start(){
       return _results.peek().start();
   }    

   public int end(){
       return _results.peek().end();
   }
}

使用:

String srcString = "1 2 3 4 5 6 7 8 9";
String pattern = "\\b[0-9]*\\b";
Pattern p = Pattern.compile(pattern);
Matcher m = p.matcher(srcString);
ReverseMatcher rm = new ReverseMatcher(m);
while (rm.find())
   System.out.print(rm.group() + " ");

输出:9 8 7 6 5 4 3 2 1

while (rm.find(9))
   System.out.print(rm.group() + " ");

输出:5 4 3 2 1

答案 3 :(得分:0)

搜索字符串是严格的正则表达式(完整,丰富的语法?)因为如果不是,for(int j = line; j >= 0 ; j--),反转该行,反转匹配并向前搜索;)

答案 4 :(得分:0)

如果以前的比赛是你已经匹配的东西,那么在向前搜索时创建匹配位置列表然后只是用它来跳回而不是向后搜索呢?