我正在尝试制作一个脚本,用于查找文本文档中符号{}之间的所有内容。它需要{}中的.txt文档特定部分,并按字母顺序对其进行组织,然后将其原封写回文本文档。文本文档示例..
bla bla bla
bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
bla bla bla
bla bla bla
所需的输出(按字母顺序排序)..
bla bla bla
bla ba bl bla ba bl {airplane:hobby, apple:banana, this: something else}
bla bla bla
bla bla bla
它还在打印什么..
bla bla bla
bla ba bl bla ba bl {apple:banana, this: something else, airplane:hobby}
bla bla bla
bla bla bla
我的代码..
def openFind():
f = open(inFile, 'r')
lines = f.read()
match = re.findall(r'{(.*?)}', lines)
before = str(match)
n=1
for i in xrange(0, len(match), n):
mydict = match[i:i+n]
for x in sorted(mydict):
c = x.split(',')
newmatch = sorted(c)
final = str(newmatch)
print final
# NOT WORKING BELOW!!! Stuck in loop?
with open(outFile,'w') as new_file:
with open(inFile) as old_file:
for line in old_file:
new_file.write(line.replace(before, after))
它将排序/字母顺序列表打印为[airplane:hobby,apple:banana,this:something else],但如何让它替换文本文档中的原始文本?必须到位,但可以制作新的文本。
答案 0 :(得分:2)
这应该有效:
import re
def openFind():
with open("test.txt", "r") as in_file:
data = in_file.read()
def sub(m):
l = [s.strip() for s in m.group(1).split(",")]
l.sort()
return "{%s}" % (", ".join(l),)
replacement = re.sub(r'{(.*?)}', sub, data)
with open("out.txt", "w") as out_file:
out_file.write(replacement)
我已使用re.sub()
来替换已排序的匹配。
答案 1 :(得分:1)
以下代码会在{
&之间对项目进行排序}
并将结果写入同一文件:
import re
with open('test.txt', 'r+') as f:
s = f.read()
r = list(s)
for mo in re.finditer('{(.*?)}', s):
d = sorted(mo.group(1).split(', '))
r[mo.start(1):mo.end(1)] = list(', '.join(d))
f.seek(0)
f.write(''.join(r))
答案 2 :(得分:1)
我会在片断中解决这个问题。首先,您希望能够从一个文件中读取并写入新文件。你可以通过多种方式做到这一点。如果您的文件很小,您可以使用readlines()
,截断原始文件,然后将其写回。
但我会假设巨大文件的可能性(即大于容易适合RAM /交换空间的文件。目前大小为几GB)。
import os
import tempfile
with tempfile.NamedTemporaryFile(delete=False) as temp:
with open(filename) as infile:
for line in infile:
temp.write(line)
os.unlink(infile)
os.rename(temp.name, infile.name)
现在我们正在阅读每一行并将其写入目的地。现在您需要做的就是拦截线并在必要时进行更改:
for line in infile:
match = re.search('{{.*?}}')
if match:
# Assumes you only have one "dictionary" per line
first_part, rest = line.split('{', maxsplit=1)
# allows for trailing data
data, last_part = rest.split('}', maxsplit=1)
data = [_.split(':') for _ in data.split(',')]
data.sort()
line = '{}{{{}}}{}'.format(first_part, ', '.join(':'.join(_) for _ in data))
temp.write(line)
您可能需要使用确切的算法进行调整,但这是我在遇到类似问题时会采取的方法。
答案 3 :(得分:1)
整个程序可以简洁地写成如下,
with open("file.txt") as fr:
content = fr.read()
matches = (match.group(1) for match in re.finditer(r"{(.*?)}", content))
for match in matches:
repl = ", ".join(sorted(match.split(", ")))
content = content.replace(match, repl)
with open("file.txt", "w") as f:
fw.write(content)