我正在尝试对输入文件的内容进行一些更改。我的输入文件如下所示:
18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 24000000 ay
24000000 25000000 k
25000000 26500000 pau
此文件是音频文件的转录。第一个数字表示开始时间,下一个数字表示结束时间。然后字母表示声音。
我必须做的改变是,有一些声音由两种不同的声音组成,即也有一些双元音。所以这些双元音必须分成两个声音。在上面的例子中,双元音是'ay'。它由'ao'和'ih'组成。 这里发生的是,'ay'的持续时间24000000 - 232000000 = 8被分配到这两个声音中。结果将是,
23200000 24000000 ay
更改为
23200000 236000000 ao
23600000 240000000 ih
我试图写一个看起来像垃圾的伪代码。
def test(transcriptionFile) :
with open("transcriptions.txt", "r+") as tFile :
for line in tFile :
if 3rd_item = ay
duration = (2nd_item[1] - 1st_item[2]) / 2
delete the line
tFile.write(1st_item, 1st_item + d, ao)
tfile.write(1st_item + d, 1st_item, ih) # next line
if__name__ == "__main__" :
test("transcriptions.txt")
谢谢。
根据我给出的建议,我将代码更改为以下内容。它仍然不正确。
def test(transcriptionFile) :
with open("transcriptions.txt", "r") as tFile :
inp = tFile.readlines()
outp = []
for ln in inp :
start, end, sound = ln.strip()
if sound == ay :
duration = (end - start) / 2
ln.delete
start = start
end = start + duration
sound = ao
outp.append(ln)
start = start + duration # next line
end = start
sound = ih
outp.append(ln)
with open("transcriptions.txt", "w") as tFile:
tFile.writelines(outp)
__name__ == "__main__"
test("transcriptions.txt")
答案 0 :(得分:2)
就地编辑文本文件非常困难。您最好的选择是:
将程序编写为Unix filter,即在sys.stdout
上生成新文件并使用外部工具将其放置到位
读入整个文件,然后在内存中构建新文件并将其写出来。
遵循第二种思路的程序如下:
# read transcriptions.txt into a list of lines
with open("transcriptions.txt", "r") as tFile:
inp = tFile.readlines()
# do processing and build a new list of lines
outp = []
for ln in inp:
if not to_be_deleted(ln):
outp.append(transform(ln))
# now overwrite transcriptions.txt
with open("transcriptions.txt", "w") as tFile:
tFile.writelines(outp)
如果你把处理位写成列表理解,那就更好了:
outp = [transform(ln) for ln in inp
if not to_be_deleted(ln)]
答案 1 :(得分:1)
以下脚本应该执行您想要的操作:
import sys
def main(src, dest):
with open(dest, 'w') as output:
with open(src) as source:
for line in source:
try:
start, end, sound = line.split()
except ValueError:
continue
if sound == 'ay':
start = int(start)
end = int(end)
offset = (end - start) // 2
output.write('%s %s ao\n' % (start, start + offset))
output.write('%s %s ih\n' % (start + offset, end))
else:
output.write(line)
if __name__ == "__main__":
main(*sys.argv[1:])
输出:
18800000 20400000 pau
20400000 21300000 aa
21300000 22500000 p
22500000 23200000 l
23200000 23600000 ao
23600000 24000000 ih
24000000 25000000 k
25000000 26500000 pau