我有Prosodylab-Aligner生成的textGrid文件,我可以在Praat
中打开。有没有可能摆脱它看起来像这样的文本文件:
Word in text | Pronounciation started at
Hello 0:0:0.000
my 0:0:1.125
friends 0:0:2.750
修改
附加的textGrid文件:
File type = "ooTextFile"
Object class = "TextGrid"
xmin = 0.0
xmax = 2.53
tiers? <exists>
size = 2
item []:
item [1]:
class = "IntervalTier"
name = "phones"
xmin = 0.0
xmax = 2.53
intervals: size = 13
intervals [1]:
xmin = 0.0
xmax = 0.62
text = "sil"
intervals [2]:
xmin = 0.62
xmax = 0.78
text = "K"
intervals [3]:
xmin = 0.78
xmax = 0.81
text = "L"
intervals [4]:
xmin = 0.81
xmax = 0.92
text = "IH1"
intervals [5]:
xmin = 0.92
xmax = 1.02
text = "K"
intervals [6]:
xmin = 1.02
xmax = 1.07
text = ""
intervals [7]:
xmin = 1.07
xmax = 1.22
text = "T"
intervals [8]:
xmin = 1.22
xmax = 1.31
text = "UW1"
intervals [9]:
xmin = 1.31
xmax = 1.51
text = "S"
intervals [10]:
xmin = 1.51
xmax = 1.67
text = "T"
intervals [11]:
xmin = 1.67
xmax = 1.85
text = "AA1"
intervals [12]:
xmin = 1.85
xmax = 1.88
text = "P"
intervals [13]:
xmin = 1.88
xmax = 2.53
text = "sil"
item [2]:
class = "IntervalTier"
name = "words"
xmin = 0.0
xmax = 2.53
intervals: size = 6
intervals [1]:
xmin = 0.0
xmax = 0.62
text = "sil"
intervals [2]:
xmin = 0.62
xmax = 1.02
text = "CLICK"
intervals [3]:
xmin = 1.02
xmax = 1.07
text = "sp"
intervals [4]:
xmin = 1.07
xmax = 1.31
text = "TO"
intervals [5]:
xmin = 1.31
xmax = 1.88
text = "STOP"
intervals [6]:
xmin = 1.88
xmax = 2.53
text = "sil"
答案 0 :(得分:1)
TextGrid文件的语法有点奇怪。为了您的限制目的,单词列表及其起点,您的解析器可能非常简单:
找到包含8个空格的文本行,字符串'name =“words”'
检查以下所有行并在下一个8个空格出现时停止并且字符串'name =“'
2a上。在12个空格后立即保存浮点数,字符串'xmin ='
2B。在12个空格后立即保存字符串,字符串'text ='
此程序的结果如下:
0.0 0.62 1.02 1.07 1.31 1.88
“SIL” “点击” “SP” “至” “停” “SIL”
现在只需重新排序这两个数组,你就会得到你的表(数字是以秒为单位的起点)。
请记住,“sil”是元标记“silence”的缩写,“sp”是“语音暂停”的缩写。虽然预期话语开头和结尾处的沉默,但语音暂停可能是错误的,因为单词“TO”的爆破/ t /开始于发音闭塞,这非常类似于语音暂停,但是爆破。
答案 1 :(得分:0)
由于这是一个Praat
文件,并且您说可以在Praat
中打开它,我认为更好的解决方案是使用Praat
来解决它。像下面这样的脚本涉及很少的信念跳跃:
form Parse TextGrid...
sentence File /path/to/your.TextGrid
integer Tier 2
endform
Read from file: file$
intervals = Get number of intervals: tier
writeInfoLine: "Word in text", tab$, "Pronounciation started at"
for i to intervals
label$ = Get label of interval: tier, i
if label$ != ""
start = Get start point: tier, i
appendInfoLine: label$, tab$, string$(start)
endif
endfor
如果将其保存到某个地方的脚本中,则可以从命令行调用Praat
,如praat /path/to/your/script.praat "/path/to/your.TextGrid" 2
,并从stdout
获取所需的输出。
您也可以手动运行它,也可以使用this来编写文件。