应用错误收集

如果您在维基百科（https://en.wikipedia.org/wiki/Category:Computer_science）中打开computer science category，它将显示总共19个子类别（https://en.wikipedia.org/wiki/Category:Computer_science）。现在，对于所有这些19子类别，如果我只想extract仅页面名称（页面标题）。例如，类别Computer science中的页面有45个页面，这些页面显示为bullets，就在维基百科子类别列表的下方。现在，对于所有其他关联的子类别，例如Areas of computer science是具有3页（https://en.wikipedia.org/wiki/Category:Areas_of_computer_science）的子类别。但是，它又有17个子类别（即深度1，考虑到遍历，即深度= 1表示我们深1）。同样，algorithm and data structures（https://en.wikipedia.org/wiki/Category:Algorithms_and_data_structures）个页面5，artificial intelligence（https://en.wikipedia.org/wiki/Category:Artificial_intelligence）个页面333和另外的categories和subcategories分为多个页面（请参阅“人工智能”类别的页面），共有37个类别和333个页面，像这样，该列表会更深入。现在我们进入了深度2。我需要提取深度1和深度2的遍历的所有页面（标题）。是否有任何算法可以实现相同的目的？

例如：计算机科学的子类别区域再次具有一些（17）子类别，总页面数为5 + 333 + 127 + 79 + 216 + 315 + 37 + 47 + 95 + 37 + 246 + 103 +考虑所有（17）子类别的21 + 2 + 55 + 113 + 94页。这是深度2，因为我切换了两次列表。同样，是否需要为其余18个子类别（https://en.wikipedia.org/wiki/Category:Computer_science）合并相同的内容，对于基本根计算机科学，其深度为2？

有没有办法实现这一目标？显示和提取这么多页面很困难，因为它会很大。因此，绝对不能超过10,000页。

有没有办法做到这一点？任何小的帮助深表感谢！

刮削具有多个深度的Wikipedia子类别（页面）吗？

1 个答案: