如何存储刮板外壳输出/响应变量而不是html文件

时间:2019-05-16 11:36:40

标签: web-scraping scrapy

我正在尝试使用cmdline.execute将html代码存储到名为response的变量中,如下面的代码所示,但是无法在scrapy shell存储和编程代码中断。谁能告诉我如何将原始HTML存储到变量中?

import scrapy
from scrapy import cmdline

linkedinnurl = "https://stackoverflow.com/users/5597065/adnan-stab=profile"
response = cmdline.execute("scrapy shell https://stackoverflow.com/users/5597065/adnan-s?tab=profile".split()))

print(response)

1 个答案:

答案 0 :(得分:1)

您可以这样将原始html存储到变量中:

class Foo extends StatelessWidget {
  @override
  Widget build(BuildContext context) {
    var widgetList = new List<Widget>();
    for (var item in items) {
      X content = fetchContentFromAPI();
      widgetList.add(abstractWidgetWith(content: content, id: item.id));
    }
    return Column(children: widgetList);
  }

  Widget abstractWidgetWith({@required int id, @required X content}) {
    switch (id) {
      case 1:
        return Implementation1(content);
      default:
        return Implementation2(content);
    }
  }
}

abstract class AbstractWidget {
  final X content;
  AbstractWidget(this.content);
}

class Implementation1 extends StatelessWidget implements AbstractWidget {
  final X content;

  Implementation1(this.content);

  @override
  Widget build(BuildContext context) {
    // Display content in some type of way
  }
}

class Implementation2 extends StatelessWidget implements AbstractWidget {
  final X content;

  Implementation2(this.content);

  @override
  Widget build(BuildContext context) {
    // Display content in some type of way
  }
}

如果不需要动态文件名,则只需:

 class MySpider(scrapy.Spider):
        def parse(self, res):
            with open(dynamic_file_name_function(res.url), 'w') as f:
                f.write(res.body)