在html中找到一个模式并用php代码

时间:2016-08-11 04:04:50

标签: php python bash sed

我正在寻找这种模式

<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

并将此模式替换为多个.html文件

<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

注意区别在于 这个

<p class="text-muted">&copy; 2014. Core Team</p>

替换为

       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>

我正在考虑使用sed进行初步尝试,我的困难是我可能或可能或不可能逃脱的角色。还有php代码中的标签或新行,我希望它出现在这里。

有很多文件要做,所以我想自动化它,但手动(复制和粘贴)可能更快。但在这种情况下,sed可能是错误的方法。有人可以指引我朝着正确的方向前进吗?在这个阶段,我可以使用其他语言(例如php,python,bash)来寻找解决方案。

然后我会计划将每个.html文件重命名为.php,其中包含以下内容:

for i in *.html; do mv "$i" "${i%.*}.php"; done;

EDIT1

根据下面的awk答案,我可以在此版本下使用它

$ awk -Wversion 2>/dev/null || awk --version
GNU Awk 4.1.1, API: 1.1 (GNU MPFR 3.1.2, GNU MP 6.0.0)
Copyright (C) 1989, 1991-2014 Free Software Foundation.

然而在这个版本上我得到了不同的输出。它似乎打印出3个文件,旧的新文件。 这个版本是否可以轻松解决?

root@4461f768e343:/github/find_pattern# awk -Wversion 2>/dev/null || awk --version
mawk 1.3.3 Nov 1996, Copyright (C) Michael D. Brennan

root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern#
root@4461f768e343:/github/find_pattern# awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div><!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.root@4461f768e343:/github/find_pattern#

2 个答案:

答案 0 :(得分:2)

您可以使用replace

html_files = ['a.html', ...]
copyright = '<p class="text-muted">&copy; 2014. Core Team</p>'
new_copyright = """       <?php
        $year = date("Y");
        echo "<p class='text-muted'>© $year. Core Team</p>";
    ?>"""
for html_file_path in html_files:
    with open(html_file_path) as html_file:
        html = html_file.read()

    if copyright in html:
        php_file_path = html_file_path.replace('.html', '.php')
        with open(php_file_path, "w") as php_file:
            php = html.replace(copyright, new_copyright)
            php_file.write(php)

请注意,这不会覆盖您的html文件,如果脚本出错,这将非常有用。

答案 1 :(得分:2)

sed用于单个行上的简单替换,因此您的任务肯定不是sed的工作。如果您的文件格式良好,您可以使用awk:

$ cat old
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

$ cat new
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>

$ cat file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
   <div class="row col-md-2 col-md-offset-5">

    <p class="text-muted">&copy; 2014. Core Team</p>
  </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

$ awk -v RS='^$' -v ORS= 'ARGIND==1{old=$0;next} ARGIND==2{new=$0;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new file
some pile of text
or other
<!-- Footer part at bottom of page-->
<div id="footer">
    <div class="row col-md-2 col-md-offset-5">
       <?php
            $year = date("Y");
            echo "<p class='text-muted'>© $year. Core Team</p>";
        ?>
    </div>

    <div id="downloadlinks">
    <!-- downloadlinks go here-->
    </div>
</div>
and more maybe.

以上使用GNU awk进行多字符RS和ARGIND。如果您想对许多文件执行此操作,您可以使用:

find . -type f -name '*.php' -exec awk -i inplace -v RS='^$' -v ORS= 'ARGIND==1{old=$0;print;next} ARGIND==2{new=$0;print;next} s=index($0,old){ $0 = substr($0,1,s-1) new substr($0,s+length(old))} 1' old new {} \;

或类似。