我正在尝试创建一个项目,帮助学生学习各个领域。我的想法是,我有一段原始文本,其中包含测验问题和答案,我想将其解析为问题标题和答案选项,这些问题将被插入到数据库中。但是,文本格式不正确,并且由于大量的问题和答案(大约每个约20k),我没有时间手动插入它们或自己格式化文本。
原始文本如下所示:
1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50
我尝试创建自己的PHP函数来正确解析文本,但是我不能让自己通过随机换行符,空格等。
我想要获得的东西:
array(1) {
[0]=>
array(3) {
["questionNumber"]=>
string(1) "1"
["questionText"]=>
string(175) "A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?"
["options"]=>
array(5) {
["a"]=>
string(6) "$44.44"
["b"]=>
string(7) "$109.08"
["c"]=>
string(7) "$118.80"
["d"]=>
string(7) "$408.04"
["e"]=>
string(7) "$444.40"
}
}
}
到目前为止我的代码:
$rawText = '1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50
';
$rawTextLines = explode("\n", $rawText);
foreach ($rawTextLines as $lineNumber => $lineContents) {
$lContents = trim($lineContents);
if (empty ($lContents)) {
unset ($rawTextLines[$lineNumber]);
} else {
$rawTextLines[$lineNumber] = $lContents;
}
}
$processedQuestions = array ();
$currentQuestionHeader = 0;
foreach ($rawTextLines as $lineNumber => $lineContents) {
if (ctype_digit(substr($lineContents, 0, 1))) { // Question header
$questionHeaderInformation = explode('.', $lineContents);
$currentQuestionHeader = $questionHeaderInformation[0];
$processedQuestions[$currentQuestionHeader]['questionNumber'] = $currentQuestionHeader;
$processedQuestions[$currentQuestionHeader]['questionText'] = $questionHeaderInformation[1];
} else { // Question option
$options = explode(')', $lineContents);
if (count ($options) % 2 === 0) {
$processedQuestions[$currentQuestionHeader]['options'][trim($options[0])] = ucfirst(trim($options[1]));
} else {
}
}
}
产生这个:
array(2) {
[1]=>
array(3) {
["questionNumber"]=>
string(1) "1"
["questionText"]=>
string(35) " A car averages 27 miles per gallon"
["options"]=>
array(1) {
["a"]=>
string(8) "$44.44 b"
}
}
[2]=>
array(3) {
["questionNumber"]=>
string(1) "2"
["questionText"]=>
string(96) " When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?"
["options"]=>
array(3) {
["a"]=>
string(1) "4"
["b"]=>
string(2) "14"
["c"]=>
string(2) "16"
}
}
}
正如您所看到的,当前输出不匹配 - 不是到目前为止,我想要获得的。
提前谢谢。
答案 0 :(得分:0)
Hellow,
^[0-9]+\. (.*)[\r\n]+a\)[\s]+(.*)[\s]+b\)[\s]+(.*)[\s]+c\)[\s]+(.*)[\s]+d\)[\s]+(.*)[\s]+e\)[\s]+(.*)[\s]*

$re = '/^[0-9]+\. (.*)[\r\n]+a\)[\s]+(.*)[\s]+b\)[\s]+(.*)[\s]+c\)[\s]+(.*) [\s]+d\)[\s]+(.*)[\s]+e\)[\s]+(.*)[\s]*/m';
$str = '1. A car averages 27 miles per gallon. If gas costs $4.04 per gallon, which of the following is closest to how much the gas would cost for this car to travel 2,727 typical miles?
a) $44.44 b) $109.08 c) $118.80
d) $408.04 e)
$444.40
2. When x = 3 and y = 5, by how much does the value of 3x2 – 2y exceed the value of 2x2 – 3y ?
a) 4
b) 14
c) 16
d) 20 e) 50';
preg_match_all($re, $str, $matches, PREG_SET_ORDER, 0);
// Print the entire match result
var_dump($matches);