我需要从此网站抓取一些新闻:https://www.huxiu.com/channel/103.html。这里103是新闻类别id。
但是如果没有触发ajax加载更多,我只能获得第一页:
非常奇怪,请求网址对于不同的新闻类别是相同的。
页面信息由引用者通过标题传递。页面由表单数据发送。
以下是我的代码片段:
self.page += 1
url = "https://www.huxiu.com/channel/ajaxGetMore"
method = "POST"
headers = {
"Host": "www.huxiu.com",
"Origin": "https://www.huxiu.com",
"Referer": "https://www.huxiu.com/channel/106.html",
"User-Agent": (
"Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/"
"537.36 (KHTML, like Gecko) Chrome/56.0.2924.87 Sa"
"fari/537.36"
),
"X-Requested-With": "XMLHttpRequest",
}
formdata = {
"huxiu_hash_code": "9aee58d3507ecafed74df13e156ab01b",
"page": str(self.page),
"catId": "106"
}
yield FormRequest(
url=url,
method=method,
headers=headers,
formdata=formdata,
callback=self.parse
)
无法加载更多新闻Feed。如何发送帖子请求以抓取更多新闻?
答案 0 :(得分:1)
在这种情况下,GET和POST请求似乎可以互换。 这是一种非常常见的AJAX分页技术:
如果您在浏览器中尝试:https://www.huxiu.com/channel/ajaxGetMore?catId=103&page=3,则会看到一些包含所有分页数据的json数据以及protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.event_home);
final Loading loading = new Loading(this);
loading.show();
new Handler().postDelayed(new Runnable() {
@Override
public void run() {
loading.cancel();
}
}, Constants.WAIT_TIME);
toolbar = (Toolbar) findViewById(R.id.toolbar);
setSupportActionBar(toolbar);
getSupportActionBar().setDisplayHomeAsUpEnabled(true);
viewPager = (ViewPager) findViewById(R.id.viewpager);
viewPager.setOffscreenPageLimit(7);
Interpolator sInterpolator = new AccelerateInterpolator();
try {
Field mScroller;
mScroller = ViewPager.class.getDeclaredField("mScroller");
mScroller.setAccessible(true);
DecelerateInterpolator decelerateInterpolator = new DecelerateInterpolator();
FixedSpeedScroller scroller = new FixedSpeedScroller(viewPager.getContext(), decelerateInterpolator);
scroller.SetDuration(500);
mScroller.set(viewPager, scroller);
} catch (NoSuchFieldException e) {
} catch (IllegalArgumentException e) {
} catch (IllegalAccessException e) {
}
setupViewPager(viewPager);
tabLayout = (TabLayout) findViewById(R.id.tabs);
tabLayout.setupWithViewPager(viewPager);
setupTabIcons();
private void setupTabIcons() {
tabLayout.addTab(tabLayout.newTab());
tabLayout.getTabAt(0).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(1).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(2).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(3).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(4).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(5).setCustomView(R.layout.event_tablayout);
tabLayout.getTabAt(6).setCustomView(R.layout.event_tablayout);
View tab1_view = tabLayout.getTabAt(0).getCustomView();
View tab2_view = tabLayout.getTabAt(1).getCustomView();
View tab3_view = tabLayout.getTabAt(2).getCustomView();
View tab4_view = tabLayout.getTabAt(3).getCustomView();
View tab5_view = tabLayout.getTabAt(4).getCustomView();
View tab6_view = tabLayout.getTabAt(5).getCustomView();
View tab7_view = tabLayout.getTabAt(6).getCustomView();
TextView tab1_title = (TextView) tab1_view.findViewById(R.id.tabtitle);
ImageView img1 = (ImageView) tab1_view.findViewById(R.id.tabicon);
TextView tab2_title = (TextView) tab2_view.findViewById(R.id.tabtitle);
ImageView img2 = (ImageView) tab2_view.findViewById(R.id.tabicon);
TextView tab3_title = (TextView) tab3_view.findViewById(R.id.tabtitle);
ImageView img3 = (ImageView) tab3_view.findViewById(R.id.tabicon);
TextView tab4_title = (TextView) tab4_view.findViewById(R.id.tabtitle);
ImageView img4 = (ImageView) tab4_view.findViewById(R.id.tabicon);
TextView tab5_title = (TextView) tab5_view.findViewById(R.id.tabtitle);
ImageView img5 = (ImageView) tab5_view.findViewById(R.id.tabicon);
TextView tab6_title = (TextView) tab6_view.findViewById(R.id.tabtitle);
ImageView img6 = (ImageView) tab6_view.findViewById(R.id.tabicon);
TextView tab7_title = (TextView) tab7_view.findViewById(R.id.tabtitle);
ImageView img7 = (ImageView) tab7_view.findViewById(R.id.tabicon);
tab1_title.setText("DASHBOARD");
img1.setImageResource(R.drawable.ic_home);
tab2_title.setText("ABOUT");
img2.setImageResource(R.drawable.ic_information);
tab3_title.setText("QR CODE");
img3.setImageResource(R.drawable.ic_qrcode);
tab4_title.setText("UPDATES");
img4.setImageResource(R.drawable.ic_announcement);
tab5_title.setText("TEAM");
img5.setImageResource(R.drawable.ic_team);
tab6_title.setText(" CONTACT US");
img6.setImageResource(R.drawable.ic_mobile_phone);
tab7_title.setText("REGISTER");
img7.setImageResource(R.drawable.ic_register);
}
public void setupViewPager(ViewPager viewPager) {
Adapter_pager adapter = new Adapter_pager(getSupportFragmentManager());
adapter.addFrag(new Dashboard(), "Dashboard");
adapter.addFrag(new Details(), "Details");
adapter.addFrag(new Qrcodedisplay(), "Qr codes");
adapter.addFrag(new Update(), "Update");
adapter.addFrag(new Teams(), "Team");
adapter.addFrag(new Contact_Us(), "Contact");
adapter.addFrag(new Event_register(), "Register");
viewPager.setAdapter(adapter);
}
等元数据。此信息易于抓取,并允许您同时抓取每个页面,因为您知道第一个请求的页数。
例如,请参阅python3的这个蜘蛛如何处理这种分页:
total_page