Question

我目前正在编写一个类似于食谱的java程序。我已经建好了所有东西，但不幸的是，我没有食谱。

我四处搜寻，发现http://allrecipes.com/。我查看了来源，发现了包含成分，食谱，营养成分的线条。

我记得在终端中使用grep，我很快发现lynx很有用。这是我到目前为止（对于示例页面）。

在第一次提到成分后获得100行：lynx -dump "http://allrecipes.com/Recipe/Potato-Crunchy-Tenders/" | grep -n -A 100 "Ingredients"

获取“成分”的行号：lynx -dump "http://allrecipes.com/Recipe/Beef-Tips-and-Noodles/" | grep -n "Ingredients" | cut -f1 -d:

我做了一些例子，发现食谱在“成分”行之后开始6行，而新成分是每隔一行，如下所示：

“135：成分[66]编辑并保存

136-

137-原始配方制作6份[67]更换份量

138-使6___________________份（*）美国（）公制[68]调整食谱

139-（[69]帮助）

140- * []

141- 1/2杯植物油用于油炸

142- * []

143-1 1/2杯牛奶

144- * []

145-1个鸡蛋

146- * []

147-1（7.6盎司）包大蒜味即食土豆泥 “

我的目标是以某种方式获取文本文件中的成分，我可以用java解析（我很满意）。我希望用配方做同样的事情。

这样，我可以自动为许多食谱做到这一点，所以我不需要手工完成所有这些。

有没有办法在java中更容易实现？

干杯。

Answer 1

感谢Hovercraft Full Of Eels，我研究了JSoup，它运作得很好。

我尽可能多地解决了今晚的问题，这是我提出的代码。

获取listOfIngredients（扩展ArrayList<Ingredient>）：

public static ListOfIngredients getListOfIngredients（final String html）{

    ListOfIngredients tmp = new ListOfIngredients();
    try {
        Element body = Jsoup.connect(html).get().body();

        try {
            for (Element elem : body.getElementsByAttributeValue("itemprop", "ingredients")) {
                Elements ingredientAmtElements = elem.getElementsByClass("ingredient-amount");
                String amount = null;
                if (!ingredientAmtElements.isEmpty()) {
                    amount = ingredientAmtElements.first().text();
                }
                String ingredient = elem.getElementsByClass("ingredient-name").first().text();
                if (!ingredient.equals("\u00a0")) {
                    tmp.add(new Ingredient(amount, ingredient));
                }
            }
        } catch (NullPointerException e) {
            e.printStackTrace();
        }
    } catch (IOException e1) {
        e1.printStackTrace();
    }

    return tmp;
}

获取Instructions（扩展ArrayList<String>）：

public static Instructions getInstructions(final String html) { Instructions instr = new Instructions(); try { Element body = Jsoup.connect(html).get().body(); Element elem = body.getElementsByAttributeValue("itemprop", "recipeInstructions").first(); for (Element e : elem.getElementsByTag("li")) { instr.add(e.text()); } } catch (IOException e) { e.printStackTrace(); } return instr; }

如何在网页中搜索以查找特定文本

1 个答案: