我想构建一个函数,其中输入是一个普通句子,输出是将该句子翻译成"子语言"。儿童语言意味着只说出每个单词中的第一个音节,但却说出3次。 那么"你好世界"将" hehehe wowowo"。
我的想法是先将句子分成一个单词列表。然后对于每个单词,我们应该有一个从0开始的计数器。如果辅音计数器加0 - 如果元音计数器加1.当计数器1我们停止时,返回辅音和元音然后转到下一个单词。但是我遇到了麻烦"访问"浏览列表中的每个单词。我怎样才能将自己的想法付诸实践?
答案 0 :(得分:1)
不要使用0-1计数器;这叫做布尔标志。当你找到元音时,继续下一步。
L
此输出是
vowel_list = 'aeiou'
sentence = "hello world"
# split the sentence into a list of words.
word_list = sentence.split()
for word in word_list:
# Find the minimal pronounceable prefix and print it 3 times
# Find the first vowel
for i in range(len(word)):
if word[i] in vowel_list:
# Grab the consonants and vowel, and stop
syllable = word[:i+1]
break
# Report the syllable in triplicate
print syllable*3
这应该可以帮助您解决眼前的问题。你仍然可以按照你描述的方式将它拼凑成一个函数,然后将单个婴儿的单词写成一个小句子。我还会留给你处理问题案例,例如
hehehe
wowowo
如果这不能解决任何事情,那么请用更清晰的描述编辑问题。
答案 1 :(得分:1)
以下是根据word pronunciations生成并使用arpabet编写的婴儿话语:
#!/usr/bin/env python3
from nltk.corpus import cmudict # $ pip install nltk
# >>> nltk.download('cmudict')
def baby_talk(word, repeat=3, phone_sep=u'\N{NO-BREAK SPACE}',
pronunciations=cmudict.dict()):
for phones in pronunciations.get(word.casefold(), []):
for i, ph in enumerate(phones):
if ph[-1] in '012': # found vowel sound
return phone_sep.join((phones[:i] + [ph[:-1]]) * repeat)
return naive_baby_talk(word, repeat, phone_sep) # no pronunciations
def naive_baby_talk(word, repeat, phone_sep, vowels="aeiouAEIOU"):
i = None
for i, char in enumerate(word, start=1):
if char in vowels:
break # found vowel
return phone_sep.join([word[:i]] * repeat)
示例:
import re
sentences = ["hello world",
"Quiet European rhythms.",
"My nth happy hour.",
"Herb unit -- a dynasty heir."]
for sentence in sentences:
sesese = " ".join(["".join(
[w if i & 1 or not w else baby_talk(w) # keep non-words as is
for i, w in enumerate(re.split("(\W+)", non_whitespace))])
for non_whitespace in sentence.split()])
print(u'"{}" → "{}"'.format(sentence, sesese))
"hello world" → "HH AH HH AH HH AH W ER W ER W ER" "Quiet European rhythms." → "K W AY K W AY K W AY Y UH Y UH Y UH R IH R IH R IH." "My nth happy hour." → "M AY M AY M AY EH EH EH HH AE HH AE HH AE AW AW AW." "Herb unit -- a dynasty heir." → "ER ER ER Y UW Y UW Y UW -- AH AH AH D AY D AY D AY EH EH EH."
注意:
nth
,hour
,herb
,heir
以元音开头European
,unit
以辅音开头y
在“节奏”中,“王朝”是元音请参阅:
答案 2 :(得分:0)
def end_at_vowel(string):
vowels = ["a","e","i","o","u"] # A list of vowels
letters = []
for l in string:
letters += l
if l in vowels:
break
return "".join(letters)
def bbt(string):
string = string.split() #Split the string into a list
return " ".join([end_at_vowel(w) * 3 for w in string])
这应该主要处理你所描述的内容。看看评论和两个函数,看看你是否可以破译正在发生的事情。
答案 3 :(得分:0)
这是我推荐正则表达式的几次之一:
import re
FIRST_SYLLABLE = re.compile(r'.*?[aeiou]', re.IGNORECASE)
def baby_talk(sentence):
words = []
for word in sentence.split():
match = FIRST_SYLLABLE.match(word)
if match:
words.append(match.group(0) * 3)
return ' '.join(words)
print baby_talk('hello world')
逐行:
import re
FIRST_SYLLABLE = re.compile(r'.*?[aeiou]', re.IGNORECASE)
这使得编译模式可以匹配任何内容,包括第一个元音。
def baby_talk(sentence):
words = []
for word in sentence.split():
match = FIRST_SYLLABLE.match(word)
这会尝试将单词与我们编译的模式匹配。
if match:
words.append(match.group(0) * 3)
如果有效,match.group(0)
包含匹配部分。鉴于'你好',match.group(0)
将是'他'。制作三份副本并将其添加到输出单词列表中。
return ' '.join(words)
返回由空格连接在一起的输出词列表。
print baby_talk('hello world')