Question

我的目标是根据一套规则为任何单词生成语音转录。

首先，我想将单词分成音节。例如，我想要一个算法在单词中找到'ch'，然后将其分开，如下所示：

Input: 'aachbutcher'
Output: 'a' 'a' 'ch' 'b' 'u' 't' 'ch' 'e' 'r'

我到目前为止：

check=regexp('aachbutcher','ch');

if (isempty(check{1,1})==0)          % Returns 0, when 'ch' was found.

   [match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')

   %Now I split the 'aa', 'but' and 'er' into single characters:
   for i = 1:length(split)
       SingleLetters{i} = regexp(split{1,i},'.','match');
   end

end

我的问题是：如何将单元格放在一起，以便将它们格式化为所需的输出？我只有匹配部分（'ch'）的起始索引，但没有分割部分（'aa'，'但'，'呃'）的起始索引。

有什么想法吗？

Answer 1

您无需使用索引或长度。简单的逻辑：从匹配处理第一个元素，然后从拆分处理第一个元素，然后从匹配等处理第二个....

[match,split,startIndex,endIndex] = regexp('aachbutcher','ch','match','split');

%Now I split the 'aa', 'but' and 'er' into single characters:
SingleLetters=regexp(split{1,1},'.','match');

for i = 2:length(split)
   SingleLetters=[SingleLetters,match{i-1},regexp(split{1,i},'.','match')];
end

Answer 2

所以，你知道＆＃39; ch＆＃39;它的长度2.你知道从正则表达式找到它的位置，因为这些索引存储在startIndex中。我想假设（请纠正我，如果我错了）你要将单词的所有其他字母拆分成单字母单元格，就像上面的输出一样。因此，您可以使用startIndex数据来构造输出，使用条件，如下所示：

check=regexp('aachbutcher','ch');

if (isempty(check{1,1})==0)          % Returns 0, when 'ch' was found.

    [match split startIndex endIndex] = regexp('aachbutcher','ch','match','split')

    %Now I split the 'aa', 'but' and 'er' into single characters:
    for i = 1:length(split)
       SingleLetters{i} = regexp(split{1,i},'.','match');
    end

end

j = 0;
for i = 1 : length('aachbutcher')
    if (i ~= startIndex(1)) && (i ~= startIndex(2)) 
        j = j +1;
        output{end+1} = SingleLetters{j};
    else
        i = i + 1;    
        output{end+1} = 'ch';
    end
end

我现在没有MATLAB，所以我无法测试它。我希望这个对你有用！如果没有，请告诉我，我会拍摄另一个镜头。

在matlab中用regexp拆分一个单词; '拆分'的startIndex？

2 个答案: