我想从脚本而不是scrap crawl
我找到了这个页面
http://doc.scrapy.org/en/latest/topics/practices.html
但实际上并没有说明该脚本的放置位置。
有什么帮助吗?
答案 0 :(得分:22)
简单明了:)
只需查看official documentation即可。我会在那里进行一些改动,这样你就可以控制蜘蛛只在你 public class NavDrawerItem {
private boolean showNotify;
private String title;
public NavDrawerItem() {
}
public NavDrawerItem(boolean showNotify, String title) {
this.showNotify = showNotify;
this.title = title;
}
public boolean isShowNotify() {
return showNotify;
}
public void setShowNotify(boolean showNotify) {
this.showNotify = showNotify;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
}
DrawerActivity.java :
public class DrawerActivity extends AppCompatActivity implements FragmentDrawer.FragmentDrawerListener{
private Toolbar mToolbar;
private FragmentDrawer drawerFragment;
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
ButterKnife.bind(this);
mToolbar = (Toolbar) findViewById(R.id.toolbar);
setSupportActionBar(mToolbar);
drawerFragment = (FragmentDrawer)
getSupportFragmentManager().findFragmentById(R.id.fragment_navigation_drawer);
drawerFragment.setUp(R.id.fragment_navigation_drawer, (DrawerLayout) findViewById(R.id.drawer_layout), mToolbar);
drawerFragment.setDrawerListener(this);
// display the first navigation drawer view on app launch
displayView(0);
}
@Override
public boolean onOptionsItemSelected(MenuItem item) {
switch (item.getItemId()) {
case R.id.action_settings:
return true;
case R.id.action_search:
Toast.makeText(getApplicationContext(), "Search action is selected!", Toast.LENGTH_SHORT).show();
return true;
default:
return super.onOptionsItemSelected(item);
}
}
@Override
public void onDrawerItemSelected(View view, int position) {
displayView(position);
}
private void displayView(int position) {
Fragment fragment = null;
String title = getString(R.string.app_name);
switch (position) {
case 0:
fragment = new HomeFragment();
title = getString(R.string.title_home);
break;
case 1:
fragment = new FriendsFragment();
title = getString(R.string.title_friends);
break;
case 2:
fragment = new MessagesFragment();
title = getString(R.string.title_messages);
break;
default:
break;
}
if (fragment != null) {
FragmentManager fragmentManager = getSupportFragmentManager();
FragmentTransaction fragmentTransaction = fragmentManager.beginTransaction();
fragmentTransaction.replace(R.id.container, fragment);
fragmentTransaction.commit();
// set the toolbar title
getSupportActionBar().setTitle(title);
}
}
}
时运行,而不是每次只是从它导入。只需添加python myscript.py
:
if __name__ == "__main__"
现在将文件保存为import scrapy
from scrapy.crawler import CrawlerProcess
class MySpider(scrapy.Spider):
# Your spider definition
pass
if __name__ == "__main__":
process = CrawlerProcess({
'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(MySpider)
process.start() # the script will block here until the crawling is finished
并运行'python myscript.py`。
享受!
答案 1 :(得分:5)
幸运的是scrapy源是开放的,所以你可以按照crawl command的方式工作,并在你的代码中做同样的事情:
...
crawler = self.crawler_process.create_crawler()
spider = crawler.spiders.create(spname, **opts.spargs)
crawler.crawl(spider)
self.crawler_process.start()
答案 2 :(得分:2)
您可以创建一个普通的Python脚本,然后使用Scrapy的命令行选项runspider
,它允许您在不必创建项目的情况下运行蜘蛛。
例如,您可以使用以下内容创建单个文件stackoverflow_spider.py
:
import scrapy
class QuestionItem(scrapy.item.Item):
idx = scrapy.item.Field()
title = scrapy.item.Field()
class StackoverflowSpider(scrapy.spider.Spider):
name = 'SO'
start_urls = ['http://stackoverflow.com']
def parse(self, response):
sel = scrapy.selector.Selector(response)
questions = sel.css('#question-mini-list .question-summary')
for i, elem in enumerate(questions):
l = scrapy.contrib.loader.ItemLoader(QuestionItem(), elem)
l.add_value('idx', i)
l.add_xpath('title', ".//h3/a/text()")
yield l.load_item()
然后,如果您正确安装了scrapy,则可以使用以下命令运行它:
scrapy runspider stackoverflow_spider.py -t json -o questions-items.json
答案 3 :(得分:0)
为什么不这样做?
from scrapy import cmdline
cmdline.execute("scrapy crawl myspider".split())
将该脚本放在放置scrapy.cfg