相关 How to export text from all pages of a MediaWiki?,但我希望输出是使用页面标题命名的单个文本文件。
SELECT page_title, page_touched, old_text
FROM revision,page,text
WHERE revision.rev_id=page.page_latest
AND text.old_id=revision.rev_text_id;
可以将其转储到stdout中并将所有页面一次性转储。
如何拆分它们并转储到单个文件中?
解决
首先转储到一个文件中:
SELECT page_title, page_touched, old_text
FROM revision,page,text
WHERE revision.rev_id=page.page_latest AND text.old_id=revision.rev_text_id AND page_namespace!='6' AND page_namespace!='8' AND page_namespace!='12'
INTO OUTFILE '/tmp/wikipages.csv'
FIELDS TERMINATED BY '\n'
ESCAPED BY ''
LINES TERMINATED BY '\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n';
然后将其拆分为单个文件,使用python:
with open('wikipages.csv', 'rb') as f:
alltxt = f.read().split('\n@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@\n')
for row in alltxt:
one = row.split('\n')
name = one[0].replace('/','-')
try:
del one[0]
del one[0]
except:
continue
txt = '\n'.join(one)
of = open('/tmp/wikipages/' + name + '.txt', 'w')
of.write(txt)
of.close()
答案 0 :(得分:1)
如果您有一些python知识,可以使用mwclient
库来实现这一目标:
sudo apt-get install python2.7
(遇到麻烦时请参阅https://askubuntu.com/questions/101591/how-do-i-install-python-2-7-2-on-ubuntu)pip install mwclient
运行下面的python脚本
import mwclient
wiki = mwclient.Site(('http', 'you-wiki-domain.com'), '/')
for page in wiki.Pages:
file = open(page.page_title, 'w')
file.write(page.text())
file.close()
请参阅mwclient页面https://github.com/mwclient/mwclient以获取参考资料