Question

我们的客户希望获得他们运行一段时间的维基站点的所有内容。他们为我们提供了“mediawiki”软件的完整数据库。我们试图用php从'text'表中提取文章，而不使用MediaWiki引擎。

MediaWiki似乎在将内容作为BLOB放入数据库之前压缩内容。没有引擎我们找不到提取它的方法。我查看了源代码，但无法重新创建它们如何提取BLOB。

有任何建议如何解决这个问题？

Answer 1

old_flags

以逗号分隔的标志列表。包含以下可能的值：

┌──────────┬──────────────────────────────────────────────────────────────────┐
│ gzip     │ Text is compressed with PHP's gzdeflate() function.              │
│          │ Note: If the $wgCompressRevisions option is on, new rows         │
│          │ (=current revisions) will be gzipped transparently at save time. │
│          │ Previous revisions can also be compressed by using the script    │
│          │ compressOld.php                                                  │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ utf-8    │ Text was stored as UTF-8.                                        │
│          │ Note: If the $wgLegacyEncoding option is on, rows *without* this │
│          │ flag will be converted to UTF-8 transparently at load time.      │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ object   │ Text field contained a serialized PHP object.                    │
│          │ Note: The object either contains multiple versions compressed    │
│          │ together to achieve a better compression ratio, or it refers to  │
│          │ another row where the text can be found.                         │
├──────────┼──────────────────────────────────────────────────────────────────┤
│ external │ Text was stored in an external location specified by old_text    │
└──────────┴──────────────────────────────────────────────────────────────────┘

Answer 2

https://www.mediawiki.org/wiki/Compression

<块引用>

标有 old_flags="gzip" 的旧条目的 old_text 使用 zlib 的 deflate 算法压缩，没有标题字节。 PHP 的 gzinflate() 会直接接受这个文本；在 Perl 等中，将窗口大小设置为 -MAX_WSIZE 以禁用标题字节。

根据文档，应该像将 blob 数据输入 php 的 T&& 一样简单。

Answer 3

只是一个猜测，但尝试这样：

SELECT UNCOMPRESS(blobname)

顺便说一下，我没有使用MediaWiki的经验，但我希望能让你朝着正确的方向前进

查看this page以获取有关MySQL压缩方法的更多信息。

使用PHP从MediaWiki数据库中提取压缩文本

3 个答案:

old_flags