Web爬网程序不关注链接

时间:2019-04-06 06:33:18

标签: python scrapy web-crawler

我想使用Scrapy抓取新闻网站。该代码从当前链接中检索了相关新闻,但没有跟随下一页链接。新闻网站具有以下链接属性

enter image description here

我正在遵循的代码:

#include <stdio.h>

double integrate(double low, double hi, int trap) {
    ...
}

int flush_line(void) {
    // Consume the pending input and return `'\n`` or `EOF`
    int c;
    while ((c = getchar()) != EOF && c != '\n')
        continue;
    return c;
}

int main() {
    // Main program loop
    for (;;) {
        int trap, test;
        double low, hi;
        char repeat;

        //Gather End Points
        for (;;) {
            printf("Enter endpoints of interval to be integrated (low hi): ");
            test = scanf("%lf %lf", &low, &hi);
            if (test == EOF)
                return 1;
            if (test != 2) {
                printf("Error: Improperly formatted input\n");
                if (flush_line() == EOF)
                    return 1;
                continue;  // ask again
            }
            if (low > hi) {
                printf("Error: low must be < hi\n");
                continue;
            }
            break;  // input is valid
        }

        //Gather amount of triangles
        for (;;) {         
            printf("Enter number of trapezoids to be used: ");
            test = scanf("%d", &trap);
            if (test == EOF)
                return 1;
            if (test != 1) {
                printf("Error: Improperly formated input\n");
                if (flush_line() == EOF)
                    return 1;
                continue;
            }
            if (trap < 1) {
                printf("Error: numT must be >= 1\n");
                continue;
            }
            break;
        }

        //Output integrate
        printf("Using %d trapezoids, integral between %lf and %lf is %lf\n",
               trap, low, hi, integrate(low, hi, trap));

        //Prompt user for another time
        for (;;) {
            printf("\nEvaluate another interval (Y/N)? ");
            if (scanf(" %c", &repeat) != 1)
                return 1;  // unexpected end of file

            switch (repeat) {
              case 'Y':
              case 'y':
                break;
              case 'N':
              case 'n':
                return 0;
              default:
                printf("Error: must enter Y or N\n");
                if (flush_line() == EOF)
                    return 1;
                continue;
            }
            break;
        }
    }             
}

尽管它从当前页面返回信息,但也显示错误。

enter image description here

我输入的信息是: NASA

1 个答案:

答案 0 :(得分:1)

主要错误是您拥有css函数和xpath的{​​{1}}选择器:

next_page

下一个问题是您在next_page = response.css("//a[@class='btn-next btn']/@href").get() 个周期内产生了下一页的请求。这将导致调用大量重复请求。

所以我想这些变化:

for