我目前遇到了一个问题。
我正在尝试使用正则表达式格式化文本块,我将解释到目前为止我已经得到的内容然后我会继续解释我的问题。
我有一个文本文件,带有一些叙述性文字。
VOLUME I
CHAPTER I
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
Lorem Ipsum has been the industry's standard dummy text ever since the 1500s,
when an unknown printer took a galley of type and scrambled it to make a type
It was popularised in the 1960s with the release of Letraset sheets containing
Lorem Ipsum passages, and more recently with desktop publishing software like
Aldus PageMaker including versions of Lorem Ipsum.
VOLUME II
CHAPTER II
Lorem Ipsum is simply dummy text of the printing and typesetting industry.
It has survived not only five centuries, but also the leap into electronic
typesetting, remaining essentially unchanged.
It was popularised in the 1960s with the release of Letraset sheets
containing Lorem Ipsum passages, and more recently with desktop
publishing software like Aldus PageMaker including versions of Lorem Ipsum.
...
...
它有多个 VOLUMES 和 CHAPTERS ,并且需要通过PHP进行格式化,使其看起来像在文本文件中一样,并且间距合适。
首先,我调用这个格式化函数来处理一些whitespacing和cleanup。
<?php
function formatting($AStr)
{
return preg_split('/[\r\n]{2,}/', trim($AStr));
}
?>
然后,我调用该文件并继续尝试格式化。
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>Jane Austen</h1>
<h2>Emma</h2>
<?php
require_once 'format.inc.php';
$p = file_get_contents('emma.txt');
$p = formatting($p);
/*
foreach ($p as $l) {
$l = trim($l);
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
$volumePattern = '/(VOLUME +[IVX]+)/';
$chaperPattern = '/(CHAPTER +[IVX]+)/';
$l = str_replace("\r\n", ' ', $l);
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
preg_replace('/(VOLUME +[IVX]+)/', "jjj", $l);
echo $l . "\n";
}*/
foreach ($p as $l) {
//$l = trim($l);
//$l = str_replace("[\r\n]", '\n', $l);
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
echo $l . "\n";
}
?>
</body>
</html>
问题是,我无法获得每个段落之间的空白(换行符)。我试过了,但我做不到。我尝试使用这一行:
if (preg_match('/[\.\w]/', $l, $m)) {
echo "\n";
}
答案 0 :(得分:3)
这可能会大大过度简化,但你不能这样做吗?
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>AUTHOR NAME</h1>
<h2>TITLE</h2>
<?php
$p = file_get_contents('emma.txt');
echo preg_replace('/^\s*((?:VOLUME|CHAPTER)\s+[IVX]+)\s*$/im', '<h3>$1</h3>', $p);
?>
</body>
</html>
修改强>
还要在<p></p>
中包含正文段落(假设段落中没有新行),请尝试以下操作:
<!DOCTYPE html>
<html>
<head>
<title></title>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<link rel="stylesheet" type="text/css" href="style.css" />
</head>
<body>
<h1>AUTHOR NAME</h1>
<h2>TITLE</h2>
<?php
$p = file_get_contents('emma.txt');
echo preg_replace_callback('/^\s*(?:(?P<header>(?:VOLUME|CHAPTER)\s+[IVX]+)|(?P<body>.+))\s*$/im', function($matches) {
if (!empty($matches['body'])) {
return '<p>'.htmlspecialchars($matches['body']).'</p>';
} else {
return '<h3>'.htmlspecialchars($matches['header']).'</h3>';
}
}, $p);
?>
</body>
</html>
答案 1 :(得分:1)
你有不同的错误,首先在“格式化”函数中,正则表达式必须是:
function formatting($AStr)
{
return preg_split('/[\r\n]{2,}/', trim($AStr));
}
在您必须知道preg_replace没有通过引用传递的变量之后,您必须通过返回函数替换您的行:
foreach ($p as $l) {
$l = trim($l);
preg_replace('#VOLUME\s+[A-z]+#Ui', "jjj", $l);
$l = str_replace("\r\n", ' ', $l);
if (preg_match('/(VOLUME +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(VOLUME +[IVX]+)/', '', $l);
if (preg_match('/(CHAPTER +[IVX]+)/', $l, $m)) {
echo '<h3>' . $m[1] . '</h3>';
}
$l = preg_replace('/(CHAPTER +[IVX]+)/', '', $l);
echo $l . "\n";
}