使用Jsoup解析器进行HTML数据提取

时间:2016-01-17 13:14:37

标签: java html xml jsoup jaunt-api

从以下HTML中,以给定格式提取数据的最佳方法是什么。

array_merge([''=>'---'], $customers->lists('email', 'id')->toArray())

预期产出:

ITEM_NAME:Teeka Salad

ITEM_DESCRIPTION:羽衣甘蓝,葵花籽,藜麦,鳄梨,葡萄番茄,苜蓿豆芽,胡萝卜和黄瓜,可选择调味品。

ITEM_PRICE:$ 9.95

ITEM_IMG:/yelp_images/s3-media4.fl.yelpcdn.com/bphoto/1P50jjYUA4ofx5hF85wm5Q/ms.jpg

我尝试过使用Jsoup和Jaunt的各种方法。仍然无法弄清楚。

1 个答案:

答案 0 :(得分:1)

以下是获取数据的程序,使用Jsoup,我使用CSS查询选择器。

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

int main(int argc, const char *argv[]){
    char temp[128];
    char *words[] = {"een","twee","drie","vier","vijf","zes","zeven","acht",
"negen","tien","elf","twaalf","dertien","veertien",
"vijftien","zestien","zeventien","achttien",
"negentien","twintig"};

    //Open the file
    FILE *myFile;
    myFile = fopen("numbers.txt","r");
    int count = sizeof(myFile);
    if (myFile == NULL){
        printf("File not found\n");
    }
    else {
        //Search the words
        while(!feof(myFile)){
            //Get the words
            fgets(temp, sizeof(temp), myFile);
                for (int i = 0; i < count; ++i){

                    if((strstr(temp, words[i])) != NULL) {
                    printf("%s\n", temp);
                    }

                }
        }
    }
    return 0;
}