正则表达式和添加HTML标签包装和格式化

时间:2018-04-01 15:30:03

标签: html css regex html5 regex-lookarounds

不确定为什么这会被投票。合法地询问我是否正确做事并寻求建议。我目前正在尝试将文本文档格式化为html文档(带有div和容器),我想知道Regex是否找到并替换实际上是要走的路,我只是部分得到了解决方案但是我我无法知道如何将每个主要项目包装在一个容器中。

数据:

Name: Person1
Address: Add1.
Office hours: 8:30 AM - 6:00 PM Mon - Sat <br> 9:30 AM - 3:00 PM Sun

Name: Person2
Address: Add2
Office hours: Not Available

Name: Person3
Address: Add3
Office hours: 8:30 AM - 6:00 PM Mon - Sun

我能够获得包裹每一行的部分,但是我无法通过下面的代码将每个组包装在一个容器中。

使用 Regexr regex101

RegEx查找:([A-Za-z]+.?[A-Za-z]+)(?:\:)(.+)

替换结果:<div class="datarow"><label>\1</label><div class="value">\:\2</div></div>

&#13;
&#13;
<div class="datarow">
    <label>Name</label>
    <div class="value"> Person1</div>
</div>
<div class="datarow">
    <label>Address</label>
    <div class="value"> Add1.</div>
</div>
<div class="datarow">
    <label>Office hours</label>
    <div class="value"> 8:30 AM - 6:00 PM Mon - Sat <br> 9:30 AM - 3:00 PM Sun </div>
</div>



<div class="datarow">
    <label>Name</label>
    <div class="value"> Person2</div>
</div>
<div class="datarow">
    <label>Address</label>
    <div class="value"> Add2</div>
</div>
<div class="datarow">
    <label>Office hours</label>
    <div class="value"> Not Available</div>
</div>

...
&#13;
&#13;
&#13;

想要这个输出:

&#13;
&#13;
<div class="container">
    <div class="datarow">
        <label>Name</label>
        <div class="value"> Person1</div>
    </div>
    <div class="datarow">
        <label>Address</label>
        <div class="value"> Add1.</div>
    </div>
    <div class="datarow">
        <label>Office hours</label>
        <div class="value"> 8:30 AM - 6:00 PM Mon - Sat <br> 9:30 AM - 3:00 PM Sun </div>
    </div>
</div>


<div class="container">
    <div class="datarow">
        <label>Name</label>
        <div class="value"> Person2</div>
    </div>
    <div class="datarow">
        <label>Address</label>
        <div class="value"> Add2</div>
    </div>
    <div class="datarow">
        <label>Office hours</label>
        <div class="value"> Not Available</div>
    </div>
</div>

...
&#13;
&#13;
&#13;

我不确定RegEx是否真的是可行的方式,但我真的很喜欢使用它,因为我需要将数千人,地址和办公时间格式化为HTML格式。是否有可能实现我想要的输出?或者有更好的方法吗?建议,特别是解释,将非常感激,因为我真的在泡菜。 :(

编辑: 使用pcre-php语言,这是regex101的默认语言

1 个答案:

答案 0 :(得分:0)

所以你需要走这些步骤来完成你的任务 - 提取数据并将其存储在数组中 -loop抛出数据并输出你得到的数组
data.txt中

Name: Person1
Address: Add1.
Office hours: 8:30 AM - 6:00 PM Mon - Sat <br> 9:30 AM - 3:00 PM Sun

Name: Person2
Address: Add2
Office hours: Not Available

Name: Person3
Address: Add3
Office hours: 8:30 AM - 6:00 PM Mon - Sun

<强>的index.php

<?php
    $content = file_get_contents('data.txt');

    //extract names from the content
    $re = '/Name: ([a-zA-Z0-9]*)/';
    $names = extractFromContent($re, $content);

    //extract addresses 
    $re = '/Address: ([a-zA-Z0-9]*)/';
    $addresses = extractFromContent($re, $content); 

    //extract office hours
    $re = "/Office hours: ([A-Za-z0-9:\-<>,;' ]*)/";
    $office_hours = extractFromContent($re, $content);

    // var_dump($names);
    // var_dump($addresses);
    // var_dump($office_hours);

    $max = max( count($names),count($addresses), count($office_hours) );


    function extractFromContent($regex, $content)
    {   
        preg_match_all($regex, $content, $matches, PREG_SET_ORDER, 0);
        $extracted = [];
        foreach($matches as $index => $match)
        {
            $extracted[] = $matches[$index][1];
        }

        return $extracted;
    }


?>
<!DOCTYPE html>
<html>
<head>
    <title>Title of page</title>
</head>
<body>

    <div class="content">

    </div>
    <?php for( $i = 0 ; $i < $max ; $i++ ): ?>
        <div class="datarow">
            <label>Name</label>
            <div><?= $names[$i] ?></div>
        </div>  
        <div class="datarow">
            <label>Address</label>
            <div><?= $addresses[$i] ?></div>
        </div>
        <div class="datarow">
            <label>Office hours</label>
            <div><?= $office_hours[$i] ?></div>
        </div>
        <hr>
    <?php endfor; ?>

</body>
</html>