类似维基百科的所有内容页面列表

时间:2011-10-19 14:48:58

标签: text split mediawiki categories

Wikipedia使用“HTML站点地图”链接到每个内容页面。大量的页面必须分成许多组,以便每个页面最多都有一个。当然有100个链接。

维基百科就是这样做的:

Special: All pages

整个文章列表分为几个较大的组,每个组由它们的第一个和最后一个字来定义:

  • “AAA评级”到“早期采用者”
  • “earth”to“lamentation”
  • “低”到“牧师”
  • ...

当您单击一个类别时,同样划分此范围(例如“地球”到“哀悼”)。重复该过程直到当前范围仅包括大约100篇文章,以便他们可以显示。

我非常喜欢这种链接列表的方法,可以最大限度地减少到达任何文章所需的点击次数。

如何自动创建此类文章列表?

所以我的问题是如何自动创建这样一个索引页面,允许点击较小的类别,直到包含的文章数量足够小才能显示它们。

想象一下,给出了所有文章名称的数组,您将如何开始使用自动类别拆分编写索引?

Array('AAA rating', 'abdicate', ..., 'zero', 'zoo')

如果你能帮助我,那就太好了。当然,我不需要一个完美的解决方案,而是一个有用的方法。非常感谢你提前!

编辑:现在在维基百科的软件(MediaWiki)中找到该部分:

<?php
/**
 * Implements Special:Allpages
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License along
 * with this program; if not, write to the Free Software Foundation, Inc.,
 * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
 * http://www.gnu.org/copyleft/gpl.html
 *
 * @file
 * @ingroup SpecialPage
 */

/**
 * Implements Special:Allpages
 *
 * @ingroup SpecialPage
 */
class SpecialAllpages extends IncludableSpecialPage {

    /**
     * Maximum number of pages to show on single subpage.
     */
    protected $maxPerPage = 345;

    /**
     * Maximum number of pages to show on single index subpage.
     */
    protected $maxLineCount = 100;

    /**
     * Maximum number of chars to show for an entry.
     */
    protected $maxPageLength = 70;

    /**
     * Determines, which message describes the input field 'nsfrom'.
     */
    protected $nsfromMsg = 'allpagesfrom';

    function __construct( $name = 'Allpages' ){
        parent::__construct( $name );
    }

    /**
     * Entry point : initialise variables and call subfunctions.
     *
     * @param $par String: becomes "FOO" when called like Special:Allpages/FOO (default NULL)
     */
    function execute( $par ) {
        global $wgRequest, $wgOut, $wgContLang;

        $this->setHeaders();
        $this->outputHeader();
        $wgOut->allowClickjacking();

        # GET values
        $from = $wgRequest->getVal( 'from', null );
        $to = $wgRequest->getVal( 'to', null );
        $namespace = $wgRequest->getInt( 'namespace' );

        $namespaces = $wgContLang->getNamespaces();

        $wgOut->setPagetitle( 
            ( $namespace > 0 && in_array( $namespace, array_keys( $namespaces) ) ) ?
            wfMsg( 'allinnamespace', str_replace( '_', ' ', $namespaces[$namespace] ) ) :
            wfMsg( 'allarticles' )
        );

        if( isset($par) ) {
            $this->showChunk( $namespace, $par, $to );
        } elseif( isset($from) && !isset($to) ) {
            $this->showChunk( $namespace, $from, $to );
        } else {
            $this->showToplevel( $namespace, $from, $to );
        }
    }

    /**
     * HTML for the top form
     *
     * @param $namespace Integer: a namespace constant (default NS_MAIN).
     * @param $from String: dbKey we are starting listing at.
     * @param $to String: dbKey we are ending listing at.
     */
    function namespaceForm( $namespace = NS_MAIN, $from = '', $to = '' ) {
        global $wgScript;
        $t = $this->getTitle();

        $out  = Xml::openElement( 'div', array( 'class' => 'namespaceoptions' ) );
        $out .= Xml::openElement( 'form', array( 'method' => 'get', 'action' => $wgScript ) );
        $out .= Html::hidden( 'title', $t->getPrefixedText() );
        $out .= Xml::openElement( 'fieldset' );
        $out .= Xml::element( 'legend', null, wfMsg( 'allpages' ) );
        $out .= Xml::openElement( 'table', array( 'id' => 'nsselect', 'class' => 'allpages' ) );
        $out .= "<tr>
    <td class='mw-label'>" .
            Xml::label( wfMsg( 'allpagesfrom' ), 'nsfrom' ) .
            "   </td>
    <td class='mw-input'>" .
            Xml::input( 'from', 30, str_replace('_',' ',$from), array( 'id' => 'nsfrom' ) ) .
            "   </td>
</tr>
<tr>
    <td class='mw-label'>" .
            Xml::label( wfMsg( 'allpagesto' ), 'nsto' ) .
            "   </td>
            <td class='mw-input'>" .
            Xml::input( 'to', 30, str_replace('_',' ',$to), array( 'id' => 'nsto' ) ) .
            "       </td>
</tr>
<tr>
    <td class='mw-label'>" .
            Xml::label( wfMsg( 'namespace' ), 'namespace' ) .
            "   </td>
            <td class='mw-input'>" .
            Xml::namespaceSelector( $namespace, null ) . ' ' .
            Xml::submitButton( wfMsg( 'allpagessubmit' ) ) .
            "   </td>
</tr>";
        $out .= Xml::closeElement( 'table' );
        $out .= Xml::closeElement( 'fieldset' );
        $out .= Xml::closeElement( 'form' );
        $out .= Xml::closeElement( 'div' );
        return $out;
    }

    /**
     * @param $namespace Integer (default NS_MAIN)
     * @param $from String: list all pages from this name
     * @param $to String: list all pages to this name
     */
    function showToplevel( $namespace = NS_MAIN, $from = '', $to = '' ) {
        global $wgOut;

        # TODO: Either make this *much* faster or cache the title index points
        # in the querycache table.

        $dbr = wfGetDB( DB_SLAVE );
        $out = "";
        $where = array( 'page_namespace' => $namespace );

        $from = Title::makeTitleSafe( $namespace, $from );
        $to = Title::makeTitleSafe( $namespace, $to );
        $from = ( $from && $from->isLocal() ) ? $from->getDBkey() : null;
        $to = ( $to && $to->isLocal() ) ? $to->getDBkey() : null;

        if( isset($from) )
            $where[] = 'page_title >= '.$dbr->addQuotes( $from );
        if( isset($to) )
            $where[] = 'page_title <= '.$dbr->addQuotes( $to );

        global $wgMemc;
        $key = wfMemcKey( 'allpages', 'ns', $namespace, $from, $to );
        $lines = $wgMemc->get( $key );

        $count = $dbr->estimateRowCount( 'page', '*', $where, __METHOD__ );
        $maxPerSubpage = intval($count/$this->maxLineCount);
        $maxPerSubpage = max($maxPerSubpage,$this->maxPerPage);

        if( !is_array( $lines ) ) {
            $options = array( 'LIMIT' => 1 );
            $options['ORDER BY'] = 'page_title ASC';
            $firstTitle = $dbr->selectField( 'page', 'page_title', $where, __METHOD__, $options );
            $lastTitle = $firstTitle;
            # This array is going to hold the page_titles in order.
            $lines = array( $firstTitle );
            # If we are going to show n rows, we need n+1 queries to find the relevant titles.
            $done = false;
            while( !$done ) {
                // Fetch the last title of this chunk and the first of the next
                $chunk = ( $lastTitle === false )
                    ? array()
                    : array( 'page_title >= ' . $dbr->addQuotes( $lastTitle ) );
                $res = $dbr->select( 'page', /* FROM */
                    'page_title', /* WHAT */
                    array_merge($where,$chunk),
                    __METHOD__,
                    array ('LIMIT' => 2, 'OFFSET' => $maxPerSubpage - 1, 'ORDER BY' => 'page_title ASC')
                );

                $s = $dbr->fetchObject( $res );
                if( $s ) {
                    array_push( $lines, $s->page_title );
                } else {
                    // Final chunk, but ended prematurely. Go back and find the end.
                    $endTitle = $dbr->selectField( 'page', 'MAX(page_title)',
                        array_merge($where,$chunk),
                        __METHOD__ );
                    array_push( $lines, $endTitle );
                    $done = true;
                }
                $s = $res->fetchObject();
                if( $s ) {
                    array_push( $lines, $s->page_title );
                    $lastTitle = $s->page_title;
                } else {
                    // This was a final chunk and ended exactly at the limit.
                    // Rare but convenient!
                    $done = true;
                }
                $res->free();
            }
            $wgMemc->add( $key, $lines, 3600 );
        }

        // If there are only two or less sections, don't even display them.
        // Instead, display the first section directly.
        if( count( $lines ) <= 2 ) {
            if( !empty($lines) ) {
                $this->showChunk( $namespace, $from, $to );
            } else {
                $wgOut->addHTML( $this->namespaceForm( $namespace, $from, $to ) );
            }
            return;
        }

        # At this point, $lines should contain an even number of elements.
        $out .= Xml::openElement( 'table', array( 'class' => 'allpageslist' ) );
        while( count ( $lines ) > 0 ) {
            $inpoint = array_shift( $lines );
            $outpoint = array_shift( $lines );
            $out .= $this->showline( $inpoint, $outpoint, $namespace );
        }
        $out .= Xml::closeElement( 'table' );
        $nsForm = $this->namespaceForm( $namespace, $from, $to );

        # Is there more?
        if( $this->including() ) {
            $out2 = '';
        } else {
            if( isset($from) || isset($to) ) {
                global $wgUser;
                $out2 = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-form' ) ).
                        '<tr>
                            <td>' .
                                $nsForm .
                            '</td>
                            <td class="mw-allpages-nav">' .
                                $wgUser->getSkin()->link( $this->getTitle(), wfMsgHtml ( 'allpages' ),
                                    array(), array(), 'known' ) .
                            "</td>
                        </tr>" .
                    Xml::closeElement( 'table' );
            } else {
                $out2 = $nsForm;
            }
        }
        $wgOut->addHTML( $out2 . $out );
    }

    /**
     * Show a line of "ABC to DEF" ranges of articles
     *
     * @param $inpoint String: lower limit of pagenames
     * @param $outpoint String: upper limit of pagenames
     * @param $namespace Integer (Default NS_MAIN)
     */
    function showline( $inpoint, $outpoint, $namespace = NS_MAIN ) {
        global $wgContLang;
        $inpointf = htmlspecialchars( str_replace( '_', ' ', $inpoint ) );
        $outpointf = htmlspecialchars( str_replace( '_', ' ', $outpoint ) );
        // Don't let the length runaway
        $inpointf = $wgContLang->truncate( $inpointf, $this->maxPageLength );
        $outpointf = $wgContLang->truncate( $outpointf, $this->maxPageLength );

        $queryparams = $namespace ? "namespace=$namespace&" : '';
        $special = $this->getTitle();
        $link = $special->escapeLocalUrl( $queryparams . 'from=' . urlencode($inpoint) . '&to=' . urlencode($outpoint) );

        $out = wfMsgHtml( 'alphaindexline',
            "<a href=\"$link\">$inpointf</a></td><td>",
            "</td><td><a href=\"$link\">$outpointf</a>"
        );
        return '<tr><td class="mw-allpages-alphaindexline">' . $out . '</td></tr>';
    }

    /**
     * @param $namespace Integer (Default NS_MAIN)
     * @param $from String: list all pages from this name (default FALSE)
     * @param $to String: list all pages to this name (default FALSE)
     */
    function showChunk( $namespace = NS_MAIN, $from = false, $to = false ) {
        global $wgOut, $wgUser, $wgContLang, $wgLang;

        $sk = $wgUser->getSkin();

        $fromList = $this->getNamespaceKeyAndText($namespace, $from);
        $toList = $this->getNamespaceKeyAndText( $namespace, $to );
        $namespaces = $wgContLang->getNamespaces();
        $n = 0;

        if ( !$fromList || !$toList ) {
            $out = wfMsgWikiHtml( 'allpagesbadtitle' );
        } elseif ( !in_array( $namespace, array_keys( $namespaces ) ) ) {
            // Show errormessage and reset to NS_MAIN
            $out = wfMsgExt( 'allpages-bad-ns', array( 'parseinline' ), $namespace );
            $namespace = NS_MAIN;
        } else {
            list( $namespace, $fromKey, $from ) = $fromList;
            list( , $toKey, $to ) = $toList;

            $dbr = wfGetDB( DB_SLAVE );
            $conds = array(
                'page_namespace' => $namespace,
                'page_title >= ' . $dbr->addQuotes( $fromKey )
            );
            if( $toKey !== "" ) {
                $conds[] = 'page_title <= ' . $dbr->addQuotes( $toKey );
            }

            $res = $dbr->select( 'page',
                array( 'page_namespace', 'page_title', 'page_is_redirect' ),
                $conds,
                __METHOD__,
                array(
                    'ORDER BY'  => 'page_title',
                    'LIMIT'     => $this->maxPerPage + 1,
                    'USE INDEX' => 'name_title',
                )
            );

            if( $res->numRows() > 0 ) {
                $out = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-chunk' ) );
                while( ( $n < $this->maxPerPage ) && ( $s = $res->fetchObject() ) ) {
                    $t = Title::makeTitle( $s->page_namespace, $s->page_title );
                    if( $t ) {
                        $link = ( $s->page_is_redirect ? '<div class="allpagesredirect">' : '' ) .
                            $sk->linkKnown( $t, htmlspecialchars( $t->getText() ) ) .
                            ($s->page_is_redirect ? '</div>' : '' );
                    } else {
                        $link = '[[' . htmlspecialchars( $s->page_title ) . ']]';
                    }
                    if( $n % 3 == 0 ) {
                        $out .= '<tr>';
                    }
                    $out .= "<td style=\"width:33%\">$link</td>";
                    $n++;
                    if( $n % 3 == 0 ) {
                        $out .= "</tr>\n";
                    }
                }
                if( ($n % 3) != 0 ) {
                    $out .= "</tr>\n";
                }
                $out .= Xml::closeElement( 'table' );
            } else {
                $out = '';
            }
        }

        if ( $this->including() ) {
            $out2 = '';
        } else {
            if( $from == '' ) {
                // First chunk; no previous link.
                $prevTitle = null;
            } else {
                # Get the last title from previous chunk
                $dbr = wfGetDB( DB_SLAVE );
                $res_prev = $dbr->select(
                    'page',
                    'page_title',
                    array( 'page_namespace' => $namespace, 'page_title < '.$dbr->addQuotes($from) ),
                    __METHOD__,
                    array( 'ORDER BY' => 'page_title DESC', 
                        'LIMIT' => $this->maxPerPage, 'OFFSET' => ($this->maxPerPage - 1 )
                    )
                );

                # Get first title of previous complete chunk
                if( $dbr->numrows( $res_prev ) >= $this->maxPerPage ) {
                    $pt = $dbr->fetchObject( $res_prev );
                    $prevTitle = Title::makeTitle( $namespace, $pt->page_title );
                } else {
                    # The previous chunk is not complete, need to link to the very first title
                    # available in the database
                    $options = array( 'LIMIT' => 1 );
                    if ( ! $dbr->implicitOrderby() ) {
                        $options['ORDER BY'] = 'page_title';
                    }
                    $reallyFirstPage_title = $dbr->selectField( 'page', 'page_title',
                        array( 'page_namespace' => $namespace ), __METHOD__, $options );
                    # Show the previous link if it s not the current requested chunk
                    if( $from != $reallyFirstPage_title ) {
                        $prevTitle =  Title::makeTitle( $namespace, $reallyFirstPage_title );
                    } else {
                        $prevTitle = null;
                    }
                }
            }

            $self = $this->getTitle();

            $nsForm = $this->namespaceForm( $namespace, $from, $to );
            $out2 = Xml::openElement( 'table', array( 'class' => 'mw-allpages-table-form' ) ).
                        '<tr>
                            <td>' .
                                $nsForm .
                            '</td>
                            <td class="mw-allpages-nav">' .
                                $sk->link( $self, wfMsgHtml ( 'allpages' ), array(), array(), 'known' );

            # Do we put a previous link ?
            if( isset( $prevTitle ) &&  $pt = $prevTitle->getText() ) {
                $query = array( 'from' => $prevTitle->getText() );

                if( $namespace )
                    $query['namespace'] = $namespace;

                $prevLink = $sk->linkKnown(
                    $self,
                    htmlspecialchars( wfMsg( 'prevpage', $pt ) ),
                    array(),
                    $query
                );
                $out2 = $wgLang->pipeList( array( $out2, $prevLink ) );
            }

            if( $n == $this->maxPerPage && $s = $res->fetchObject() ) {
                # $s is the first link of the next chunk
                $t = Title::MakeTitle($namespace, $s->page_title);
                $query = array( 'from' => $t->getText() );

                if( $namespace )
                    $query['namespace'] = $namespace;

                $nextLink = $sk->linkKnown(
                    $self,
                    htmlspecialchars( wfMsg( 'nextpage', $t->getText() ) ),
                    array(),
                    $query
                );
                $out2 = $wgLang->pipeList( array( $out2, $nextLink ) );
            }
            $out2 .= "</td></tr></table>";
        }

        $wgOut->addHTML( $out2 . $out );
        if( isset($prevLink) or isset($nextLink) ) {
            $wgOut->addHTML( '<hr /><p class="mw-allpages-nav">' );
            if( isset( $prevLink ) ) {
                $wgOut->addHTML( $prevLink );
            }
            if( isset( $prevLink ) && isset( $nextLink ) ) {
                $wgOut->addHTML( wfMsgExt( 'pipe-separator' , 'escapenoentities' ) );
            }
            if( isset( $nextLink ) ) {
                $wgOut->addHTML( $nextLink );
            }
            $wgOut->addHTML( '</p>' );

        }

    }

    /**
     * @param $ns Integer: the namespace of the article
     * @param $text String: the name of the article
     * @return array( int namespace, string dbkey, string pagename ) or NULL on error
     * @static (sort of)
     * @access private
     */
    function getNamespaceKeyAndText($ns, $text) {
        if ( $text == '' )
            return array( $ns, '', '' ); # shortcut for common case

        $t = Title::makeTitleSafe($ns, $text);
        if ( $t && $t->isLocal() ) {
            return array( $t->getNamespace(), $t->getDBkey(), $t->getText() );
        } else if ( $t ) {
            return null;
        }

        # try again, in case the problem was an empty pagename
        $text = preg_replace('/(#|$)/', 'X$1', $text);
        $t = Title::makeTitleSafe($ns, $text);
        if ( $t && $t->isLocal() ) {
            return array( $t->getNamespace(), '', '' );
        } else {
            return null;
        }
    }
}

2 个答案:

答案 0 :(得分:2)

这不是一个好方法,因为当你到达列表末尾时没有办法停止。如果项目数超过最大值,您只想分割项目(尽管您可能希望在那里增加一些灵活性,因为您可以进入页面上有两个项目的阶段。)

我假设数据集实际上来自数据库,但使用$ items数组以便于显示

最简单的,假设它来自发送起始和结束索引号的网页,并且您已检查这些数字是否有效且已消毒

$itemsPerPage = 50; // constant
$itemStep = ($end - $start) / $itemsPerPage;

if($itemStep < 1)
{
    for($i = $start; $i < $end; $i++)
    {
        // display these as individual items
        display_link($items[$i]);
    }
}
else
{
    for($i = $start; $i < $end; $i += $itemStep)
    {
        $to = $i + ($itemStep - 1); // find the end part
        if($to > $end)
            $to = $end;
        display_to_from($items[$i], $items[$to]);
    }
}

显示功能显示所需的链接。但是,这样做的一个问题是你可能想要调整每页的项目,因为你冒着一套(比如说)51并最终得到一个从1到49的链接的风险,另外50个到51。

我不明白你为什么要在你的伪代码中将它安排在一个组中,因为你要从一个页面到另一个页面进行进一步的排序,所以你只需要每个部分的开头和结尾,直到你到达页面的位置。所有链接都适合。

- 编辑

原来是错的。现在,您要根据要显示的最大项目划分必须经过的项目数量。如果它是1000,这将列出20个项目,如果它是100,000,那么每2,000个项目。如果它小于您显示的数量,您可以单独显示它们。

- 再次编辑 - 添加有关数据库的更多信息

不,你是对的,你不想加载2,000,000个数据记录,而你不需要。 您有两个选项,您可以制作一个准备好的声明,例如“select * from article where article =?”并在结果中循环获取一个,或者如果你想在一个块中进行 - 假设有一个mysql数据库和上面的代码,

$numberArray = "";
for($i = $start; $i < $end; $i += $itemStep)
{
    $to = $i + ($itemStep - 1); // find the end part
    if($to > $end)
        $to = $end;
    // display_to_from($items[$i], $items[$to]);
    if( $i != $start)
        $numberArray += ", ";
    $numberArray.= $i.", ".$to;
}
$sqlQuery = "Select * from articles where article_id in (".$numberArray.")";
... do the mysql select and go through the results, using alternate rows as the start and end

这会为您提供一个查询,例如'选择*来自文章,其中article_id in(1,49,50,99,100,149 ... etc)'

作为普通集的过程

答案 1 :(得分:0)

我在伪代码中的方法:

$items = array('air', 'automatic', 'ball', ..., 'yield', 'zero', 'zoo');
$itemCount = count($items);
$itemsPerPage = 50; // constant
$counter = 0;
foreach ($items as $item) {
    $groupNumber = floor($counter/$itemsPerPage);
    // assign $item to group $groupNumber
    $counter++;
}
// repeat this procedure recursively for each of the new groups

你认为这是一个好方法吗?你能改进或完善它吗?