我正在使用这个库:http://benreeves.co.uk/objective-c-hmtl-parser/来解析我正在制作的一个小iPhone应用程序的HTML。到目前为止,我已经得到了代码,但是当它出现重音时失败了(到目前为止只有经验é)。这是我正在使用的代码:
NSError * error = nil;
HTMLParser * parser = [[HTMLParser alloc] initWithContentsOfURL:[NSURL URLWithString:@"http://intranet.westminster.org.uk/almanack/food.asp?nextweek=TRUE"] error:&error];
if (error) {
NSLog(@"Error: %@", error);
return nil;
}
HTMLNode * bodyNode = [parser body]; //Find the body tag
NSArray *individualMeals = [bodyNode findChildTags:@"font"];
for (HTMLNode *node in individualMeals) {
if ([[node getAttributeNamed:@"color"] isEqual:@"green"]) {
NSLog(@"%@",[node rawContents]);
}
}
但它并没有解析所有文本。它在URL中找到重音后似乎放弃了。这是它在运行时产生的结果:
2010-10-07 18:40:59.296 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.298 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.305 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.307 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.308 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Baked Beans <br/>Breakfast special <br/>Three cheese omelets <br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Croissants <br/><br/> Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.309 Westminster[1011:207] <font color="green">Mulligatawny <br/>Black Olive <br/>RICE <br/>Roasted med veg in paella rice <br/>Hot and sticky wings on yellow rice <br/>Hoi Sin Pork Belly Steaks <br/>Vegetable Biriyani with a Mild Curry Sauce <br/>Babycorn Bamboo Shoots and Water Chestnuts <br/>Stir fried noodles with seaweed <br/>Lemon Sponge with Orange Sauce <br/>Vanilla Granola</font>
2010-10-07 18:40:59.310 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.312 Westminster[1011:207] <font color="green">Pea & Ham <br/><br/>Black Olive <br/>Roast Chicken with Bread Sauce and Roast Jus <br/>Warm Salad of Salmon and Crispy Bacon <br/><br/><br/>Vegetarian Chilli <br/>With Sour Cream and Braised Rice <br/>Green Beans <br/><br/>Bubble & Squeak <br/><br/>Tiramisu <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.313 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Grilled Tomato <br/>Grilled mushrooms <br/>Fried Egg <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Bread <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.317 Westminster[1011:207] <font color="green">Root Vegetable <br/>Red Pesto <br/>WRAP <br/>Chimichanga <br/>Mexican fish tortillas <br/>Roast Leg of Lamb <br/>Gnocchi with Roasted Vegetables and Flaked Parmesan <br/>Broccoli <br/><br/><br/>Thyme Roasted Potatoes <br/> Sticky Toffee Pudding and Toffee Sauce <br/>Banana Bread</font>
2010-10-07 18:40:59.318 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.318 Westminster[1011:207] <font color="green">Tomato with Basil Oil <br/>Red Pesto <br/>Beef Olives <br/><br/>Lamb with Ginger, Spring onion and Noodles <br/><br/><br/>Field Mushroom Pies <br/>Ratatouille <br/><br/>Creamed Potatoes <br/><br/>Lemon Tart <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.319 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Baked Beans <br/>Grilled Tomato <br/>Breakfast special <br/>Avocado on toast <br/><br/>Plain Porridge <br/><br/><br/>Bread and banana bread <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.333 Westminster[1011:207] <font color="green">(GREEK) <br/><br/>FLAT BREADS <br/>SPINACH, ROCKET AND FETA AND TOASTED SOUR DOUGHS <br/>SEAFOOD STUFFED PEPPERS <br/>STIFADO (beef) <br/><br/>LAMB FRICASSEE <br/>zucchini pie from Macedonia <br/>RICE <br/><br/>GIGANTIS PLAKI <br/><br/>ORANGE AND LEMON CAKE TOPPED WITH GREEK YOGURT AND HONEY</font>
2010-10-07 18:40:59.333 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.334 Westminster[1011:207] <font color="green">Roasted Vegetable <br/>FLAT BREADS <br/>Pork Steak Served with a Tomato, Tarragon and Mushroom sauce <br/>Roast beef and homemade horseradish sauce <br/><br/><br/>Lancashire Cheese Sausages with Onion Gravy <br/>Courgettes <br/><br/>Roast Potatoes <br/><br/>Mississippi Mud Pie <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.343 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Grilled mushrooms <br/>Fried Egg <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/>Bread <br/><br/> Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.344 Westminster[1011:207] <font color="green">Leek, Blue Cheese and Potato <br/>Sunflower Seed <br/>COUS COUS <br/>Couscous with apricots, lemon and coriander <br/><br/>Couscous fried chicken with couscous and spiced tomato sauce <br/>Butchers Sausages <br/>Balsamic Roasted Vegetable Frittata <br/>Red Cabbage <br/><br/><br/>Mashed Potatoes <br/><br/>Jam Roly Poly <br/>Bakewell Slice</font>
2010-10-07 18:40:59.344 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.345 Westminster[1011:207] <font color="green">Curried Parsnip and Apple <br/>Sunflower Seed <br/>Spiced Sticky chicken pieces <br/>Mexican Beef Chilli Wraps with Natural Yogurt and Guacamole <br/><br/><br/>Roasted Teriyaki Tofu Steaks with Glazed Green Vegetables <br/>Spiced Aubergine <br/><br/>Rice and Peas <br/><br/>Mango Mousse <br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.351 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Baked Beans <br/>Grilled Tomato <br/>Breakfast special <br/>Muffin bar <br/><br/>Plain Porridge <br/><br/><br/><br/>Croissants <br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.352 Westminster[1011:207] <font color="green">Carrot and Chilli <br/>Rosemary <br/>NOODLES <br/><br/>Crispy tofu <br/>Lemon chicken <br/>Fish with Traditional Crispy Batter <br/>Japanese Vegetable Curry with Rice Noodles and Tofu <br/>Garden peas <br/><br/><br/>Chips <br/>Viennese Jam Tart and Custard <br/>Fresh Fruit Salad</font>
2010-10-07 18:40:59.361 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.361 Westminster[1011:207] <font color="green">Three onion, spring, red and white <br/>Rosemary <br/>Pepperoni Pizza Topped with Boccaccio <br/>Bolognaise pasta bake <br/><br/>Vegetarian Plait <br/>Green Cabbage <br/><br/>Oven Baked Cajun Wedges <br/><br/>Ice <br/>Cream Sundae <br/><br/>3 Cheeses & Biscuits</font>
2010-10-07 18:40:59.362 Westminster[1011:207] <font color="green">Sausage <br/>Bacon <br/>Hash Brown <br/>Grilled Mushrooms <br/>Poached Eggs <br/><br/><br/><br/>Plain Porridge <br/><br/><br/><br/><br/><br/>Natural Yogurt <br/>Dried Fruits <br/>Granola <br/>Honey</font>
2010-10-07 18:40:59.362 Westminster[1011:207] (null)
2010-10-07 18:40:59.363 Westminster[1011:207] <font color="green"/>
2010-10-07 18:40:59.363 Westminster[1011:207] <font color="green"/>
它在该部分放弃了炒土豆,并且不会从该部分或任何后面部分返回任何结果。
我认为这可能是因为该网站没有对és进行编码。当我查看来源时,我看到é而不是&amp; eacute; (不含空格,否则按其格式化......),如本网站所建议:http://www.w3.org/MarkUp/html3/latin1.html
感谢您的时间。如果你知道从网站上获取午餐的更好方法,我也很乐意听到它。
答案 0 :(得分:0)
我有类似的问题。问题是libxml2只能解析UTF8编码的文件。所以你需要先将html页面转换为UTF8。
答案 1 :(得分:0)
我刚试过这个。我不会使用initWithContentsOfURL方法,而是使用connectionDidFinishLoading委托方法中的initWithData方法,例如
HTMLParser * parser = [[HTMLParser alloc] initWithData:receivedData error:&error];
似乎可以使用特殊字符和其他编码更好地工作。