我正试图在几周前修改那个电话号码脚本以帮助朋友。这是我用作起点的脚本。
# import regular expressions
import re
# import argv
from sys import argv
#arguments to provide at command line
script, filename = argv
#load the file
data = open(filename)
#read the file
read_file = data.read()
# create a regular expression to filter out phone numbers
phone_finder = re.compile(r"\(\d{3}\)\s*\d{3}-\d{4}")
# r to tell its a raw string
# \( to match "("
# \d{3} to match 3 digits
# \) to match ")"
# \s* account for no spaces
# \d{3} to match 3 digits
# - to match an "-"
# \d{4} to match 4 digits
# print the results
print phone_finder.findall(read_file)
他想要一种搜索XML文件并查找“<excerpt:encoded><![CDATA[]]></excerpt:encoded>"
的方法
或
<excerpt:encoded><![CDATA[We love having a frother to make a latte or cappuccino, and think you'll enjoy some hot milk on these cold winter nights to put you to sleep as well.]]></excerpt:encoded>
并用
替换所有实例<excerpt:encoded><![CDATA[]]></excerpt:encoded>
但我不确定这是怎么回事,因为在第二个例子中,文本中的每个实例的文本都不同。
我是Python的新手,所以任何帮助都会受到赞赏。 感谢您的时间。
答案 0 :(得分:0)
要从<excerpt:encoded>
元素中删除所有内容:
import xml.etree.cElementTree as etree
etree.register_namespace('excerpt', 'your namespace') # to preserve prefix
# read xml
doc = etree.parse(filename)
# clear elements
for element in doc.iter(tag='{your namespace}encoded'):
element.clear()
# write xml
doc.write(filename + '.cleared')
您应该将'your namespace'
替换为实际名称空间excerpt
前缀引用。