对正则表达式不太熟悉,我需要找到一种解析维基百科项目列表的方法。我使用维基百科的api.php删除了内容,我留下的数据看起来像这样:
==Formal fallacies==
A [[formal fallacy]] is an error in logic that...
* [[Appeal to probability]] – takes something for granted because...
* [[Argument from fallacy]] – assumes that if an argument ...
* [[Base rate fallacy]] – making a probability judgement...
* [[Conjunction fallacy]] – assumption that an outcome simultaneously...
* [[Masked man fallacy]] – ...
===Propositional fallacies===
* [[Affirming a disjunct]] – concluded that ...
* [[Affirming the consequent]] – the [[antecedent...
* [[Denying the antecedent]] – the [[consequent]] in...
所以,我需要一种方法来提取数据,以便:
答案 0 :(得分:1)
这样做:
preg_match_all('~^\h*+\*\h*\[\[(?<name>[a-z ]++)]]\h*+[-–]\h*+(?<description>.++)$~imu', $text, $results, PREG_SET_ORDER);
foreach($results as &$result) {
foreach($result as $key=>$value) {
if (is_numeric($key)) unset($result[$key]); }
}
echo '<pre>' . print_r($results, true) . '</pre>';
答案 1 :(得分:0)
首先替换
^((?!\*\s\[\[).)*$
空白。这将删除不包含* [[
删除换行符替换
^\n|\r$
空白。
以下是获取标题和说明的正则表达式:
^\s+\*\s\[\[([^\]\]]*)\]\]\s–(.*)
Title: "$1", Description: "$2"