这是我的档案。
...
</script>
<!--START: Google Analytics --->
<script type="text/javascript"
src="../src/goog/ga_body.js"></script>
<!--END: Google Analytics --->
</body>
</html>
...
如何删除包含<!--START: Google Analytics --->
和<!--END: Google Analytics --->
的所有内容?这样有效:
<!--START: Google Analytics --->
<script type="text/javascript"
src="../src/goog/ga_body.js"></script>
<!--END: Google Analytics --->
将会消失。这将留下,即没有任何东西,4行将被替换为空。
</script>
<nothing here 4 lines deleted>
</body>
</html>
我正在用bash做这个,所以也许sed和awk可能是我最好的选择,虽然python可能会更好。
这是我以前写过的,但编码可能很差,我会解决这个问题find2PatternsAndDeleteTextInBetween.sh
:
#HEre I want to find 2 patterns and delete whats in between
#this example works
#this is the 2 patterns I want to fine Start and End
#have to use some escape characters here for this to show properly
# have to use \n for it to appear in this format
#<!-- Start of StatCounter Code for DoYourOwnSite -->
# text would go here
#<!-- End of StatCounter Code for DoYourOwnSite -->>
#b="<!-- Start of StatCounter Code for DoYourOwnSite -->"
#b2="<!-- End of StatCounter Code for DoYourOwnSite -->"
#p1="PATTERN-1"
#p2="PATTERN-2"
p1="<!-- Start of StatCounter Code for DoYourOwnSite -->"
p2="<!-- End of StatCounter Code for DoYourOwnSite -->"
fname="*.html"
num_of_files_pattern1=ls #grep $p1 fname
echo "fname(s) to apply the sed to:"
echo $fname
echo "num_of_files_pattern1 is:"
echo $num_of_files_pattern1
echo "Pattern1 is equal to:"
echo $p1
echo "Pattern2 is equal to:"
echo $p2
#this is current dir where the script is
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"
echo "DIR is equal to:"
echo $DIR
#cd to the dir where I want to copy the files to:
cd "$DIR"
# this will find the pattern <\head> in all the .html files and place "This should appear before the closing head tag" this before it
# it will also make a backup with .bak extension
#sed -i.bak '/<\\head>/i\This should appear before the closing head tag' *.html
echo "sed on the file"
# this does the head part
#sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works
#sed "/$p1/,/$p2/d" *.txt # this works
#sed "/$p1/,/$p2/d" $fname # this works
sed -i.bak "/$p1/,/$p2/d" $fname # this works
这就是我最终的结果,但下面有一个更强大的答案:
# ------------------------------------------------------------------
# [author] find2PatternsAndDeleteTextInBetween.sh
# Description
# Here I want to find 2 patterns and delete what's in between
# this example works
#
# EXAMPLE:
# this is the 2 patterns I want to find Start and End
# <!-- Start of StatCounter Code for DoYourOwnSite -->
# text would go here
# <!-- End of StatCounter Code for DoYourOwnSite -->>
#
# ------------------------------------------------------------------
p1="<!--START: Google Analytics --->"
p2="<!--END: Google Analytics --->"
fname=".html"
echo "fname(s) to apply the sed to:"
echo *"$fname"
echo -e "\n"
echo "Pattern1 is equal to:"
echo -e "$p1\n"
echo "Pattern2 is equal to:"
echo -e "$p2\n"
echo -e "PWD is: $PWD\n"
echo "sed on the file"
#sed '/PATTERN-1/,/PATTERN-2/d' *.txt # this works
#sed "/$p1/,/$p2/d" *.txt # this works
#sed "/$p1/,/$p2/d" $fname # this works
sed -i.bak "/$p1/,/$p2/d" *"$fname" # this works
答案 0 :(得分:2)
sed
用于执行此任务
$ sed -i'.bak' '/<!--START/,/<!--END/d' file
如果你有其他类似标签的行添加了更多的模式。
对于多个文件,例如file1,..,file4
$ for f in file{1..4}; do sed -i'.bak' '/<!--START/,/<!--END/d' "$f"; done
答案 1 :(得分:2)
需要考虑的事项:
$ awk '/<!--(START|END): Google Analytics --->/{f=!f;next} !f' file
...
</script>
</body>
</html>
...
答案 2 :(得分:1)
根据您的问题中的脚本判断,您似乎已经知道如何使用sed
从单个文件中删除感兴趣的范围(sed -i.bak "/$p1/,/$p2/d" $fname
),但是正在寻找 强大的方式来处理脚本中的多个文件(假设为bash
):
#!/usr/bin/env bash
# cd to the dir. in which this script is located.
# CAVEAT: Assumes that the script wasn't invoked through a *symlink*
# located in a different dir.
cd -- "$(dirname -- "$BASH_SOURCE")" || exit
fpattern='*.html' # specify source-file globbing pattern
shopt -s failglob # make sure that globbing expands to nothing if nothing matches
fnames=( $fpattern ) # expand to matching files and store in array
num_of_files_matching_pattern=${#fnames[@]} # count matching files
(( num_of_files_matching_pattern > 0 )) || exit # abort, if no files match
printf '%s\n%s\n' "Running from:" "$PWD"
printf '%s\n%s\n' "Pattern matching the files to process:" "$fpattern"
printf '%s\n%s\n' "# of matching files:" "$num_of_files_matching_pattern"
# Determine the range-endpoint-identifier-line regular expressions.
# CAVEAT: Make sure you escape any regular-expression metacharacters you want
# to be treated as *literals*.
p1='^<!--START: Google Analytics --->$'
p2='^<!--END: Google Analytics --->$'
# Remove the range identified by its endpoints from all matching input files
# and save the original files with extension '.bak'
sed -i'.bak' "/$p1/,/$p2/d" "${fnames[@]}" || exit
暂且不说:我建议不要在脚本文件名中使用后缀.sh
:
文件中的shebang行足以告诉系统将脚本传递给哪个shell /解释器。
未指定为后缀,您可以在以后自由更改实现(例如,更改为Python),而不会破坏依赖脚本的现有程序。
在目前的情况下,假设bash
的使用实际上是可以接受的,.sh
会产生误导,因为它建议使用sh
- 仅限功能的脚本。< / p>
确定正在运行的脚本的真实目录,甚至通过位于不同目录中的符号链接调用脚本:
如果您可以假设 Linux 平台(或至少 GNU readlink
),请使用:
dirname -- "$(readlink -e -- "$BASH_SOURCE")"
否则,需要使用帮助函数的更精细的解决方案 - 请参阅我的this answer。