Question

我写了一个单词定义fetcher，它从字典网站解析网页。并非所有网页都具有完全相同的HTML结构，因此我必须实现多种解析方法来支持大多数情况。

以下是我到目前为止所做的，这是非常难看的代码。

您认为编写某种迭代回退机制的最简洁方法是什么（可能有更合适的术语），以便我可以实现N个有序解析方法（解析失败必须触发下一个解析方法，而IOException等异常应该破坏进程）？

    public String[] getDefinition(String word) {
    String[] returnValue = { "", "" };
    returnValue[0] = word;
    Document doc = null;
    try {
        String finalUrl = String.format(_baseUrl, word);
        Connection con = Jsoup.connect(finalUrl).userAgent("Mozilla/5.0 (Linux; U; Android 2.1; en-us; Nexus One Build/ERD62) AppleWebKit/530.17 (KHTML, like Gecko) Version/4.0 Mobile Safari/530.17");
        doc = con.get();
        // *** Case 1 (parsing method that works for 80% of the words) ***
        String basicFormOfWord = doc.select("DIV.luna-Ent H2.me").first().text().replace("·", "");
        String firstPartOfSpeech = doc.select("DIV.luna-Ent SPAN.pg").first().text();
        String firstDef = doc.select("DIV.luna-Ent DIV.luna-Ent").first().text();

        returnValue[1] = "<b>" + firstPartOfSpeech + "</b><br/>" + firstDef;
        returnValue[0] = basicFormOfWord;
    } catch (NullPointerException e) {
        try {
            // *** Case 2 (Alternate parsing method - for poorer results) ***
            String basicFormOfWord = doc.select("DIV.results_content p").first().text().replace("·", "");
            String firstDef = doc.select("DIV.results_content").first().text().replace(basicFormOfWord, "");

            returnValue[1] = firstDef;
            returnValue[0] = basicFormOfWord;
        } catch (Exception e2) {
            e2.printStackTrace();
        }
    } catch (Exception e) {
        e.printStackTrace();
    }
    return returnValue;
}

Answer 1

听起来像Chain-of-Responsibility一样的模式。我会有以下内容：

public interface UrlParser(){
     public Optional<String[]> getDefinition(String word) throws IOException;
}

public class Chain{
    private List<UrlParser> list;

    @Nullable
    public String[] getDefinition(String word) throws IOException{
         for (UrlParser parser : list){
             Optional<String[]> result = parser.getDefinition(word);
             if (result.isPresent()){
                return result.get();
             }
         }
        return null;
    }
}

我在这里使用Guava的Optional，但您也可以从界面返回@Nullable。然后为您需要的每个URL解析器定义一个类，并将它们注入Chain

Answer 2

如前所述，责任链是一个很好的候选人。 John的回答OTOH没有正确意义上的责任链，因为UrlParser没有主动决定是否处理对下一个解析器的请求。这是我的琐碎镜头：

public class ParserChain {
    private ArrayList<UrlParser> chain = new ArrayList<UrlParser>();
    private int index = 0;
    public void add(UrlParser parser) {
        chain.add(parser);
    }
    public String[] parse(Document doc) throws IOException {
        if (index = chain.size()){
            return null;
        }
        return chain.get(index++).parse(doc);
    }
}

public interface UrlParser {
    public String[] parse(Document doc, ParserChain chain) throws IOException;
}

public abstract class AbstractUrlParser implements UrlParser {
    @Override
    public String[] parse(Document doc, ParserChain chain) throws IOException {
        try {
            return this.doParse(doc);
        } catch (ParseException pe) {
            return chain.parse(doc);
        }
    }
    protected abstract String[]  doParse(Document doc) throws ParseException, IOException;
}

值得注意的事情：

此代码为ParserChain＃parse保留一个堆栈框架，为其输入的每个解析器保留一个UrlParser #parse文件，直到某个解析器停止责任链。如果你有庞大的链，你可以运行堆栈溢出（如何适当）
不扩展AbstractUrlParser的UrlParser可以修改参数String，而不是委托链中的下一个，或者委托链中的下一个，然后修改结果。
ParserChain不是线程安全的（但我认为这是责任链模式固有的）

编辑：截至Sebastien评论的更正代码

设计模式以实现迭代回退机制

2 个答案: