我想将我创建的包含我需要的值的数据帧对象的一部分分配给新对象。但是,当我尝试运行我的python文件时,我收到错误,该文件表明字符串对象没有我创建的包含值的对象的属性。不确定有什么问题。
AttributeError: 'str' object has no attribute 'vowel_map'
training.txt文件是:
COED:K OW1 EH2 D
PURVIEW:P ER1 V Y UW2
HEHIR:HH EH1 HH IH0 R
MUSCLING:M AH1 S AH0 L IH0 NG
NONPOISONOUS:N AA0 N P OY1 Z AH0 N AH0 S
LAVECCHIA:L AA0 V EH1 K IY0 AH0
BUCKLED:B AH1 K AH0 L D
EATEN:IY1 T AH0 N
SCIMED:S AY1 M EH2 D
MORTIS:M AO1 R T IH0 S
CONSERVATOR:K AH0 N S ER1 V AH0 T ER0
我正在运行的python文件是:
import pandas as pd
import string
vowels = ('AA','AE','AH','AO','AW','AY','EH','ER','EY','IH','IY','OW','OY','UH','UW')
def remove_stress(string):
if type(string) in [list, tuple]:
string = ' '.join(string)
return ''.join([i for i in string if not i.isdigit()]).split()
def phoneme_map(phon_list, phoneme_list):
return [1 if phoneme in phoneme_list else 0 for phoneme in phon_list]
def get_words(file_path):
words = pd.read_csv(file_path, sep=':', names = ['word', 'string_of_phon'])
words['phon_list'] = words.string_of_phon.apply(str.split)
words['stressless_phon_list'] = words.string_of_phon.apply(remove_stress)
words['vowel_map'] = words.stressless_phon_list.apply(phoneme_map, args = (vowels,))
return words
if __name__ == '__main__':
data_loc = 'training.txt'
words = get_words(data_loc)
word_vowels = [word.vowel_map for word in words]
答案 0 :(得分:2)
如果你想要一个热编码元音:
from sklearn.feature_extraction.text import CountVectorizer
vowels = ['AA','AE','AH','AO','AW','AY','EH','ER','EY','IH','IY','OW','OY','UH','UW']
df = pd.read_csv(file_path, sep=':', names = ['word', 'string_of_phon'])
vect = CountVectorizer(vocabulary=[v.lower() for v in vowels])
X = vect.fit_transform(df['string_of_phon'].str.replace(r'\d+', ''))
r = pd.DataFrame(X.A, columns=vect.get_feature_names(), index=df.index)
产量
In [138]: r
Out[138]:
ao er uw eh oy ey ow ih uh ah ay iy ae aw aa
0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
3 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0
4 0 0 0 0 1 0 0 0 0 2 0 0 0 0 1
5 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1
6 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
8 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
9 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
10 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0
您可以将其加入原始DF:
In [139]: df.join(r)
Out[139]:
word string_of_phon ao er uw eh oy ey ow ih uh ah ay iy ae aw aa
0 COED K OW1 EH2 D 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0
1 PURVIEW P ER1 V Y UW2 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0
2 HEHIR HH EH1 HH IH0 R 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
3 MUSCLING M AH1 S AH0 L IH0 NG 0 0 0 0 0 0 0 1 0 2 0 0 0 0 0
4 NONPOISONOUS N AA0 N P OY1 Z AH0 N AH0 S 0 0 0 0 1 0 0 0 0 2 0 0 0 0 1
5 LAVECCHIA L AA0 V EH1 K IY0 AH0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 1
6 BUCKLED B AH1 K AH0 L D 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0
7 EATEN IY1 T AH0 N 0 0 0 0 0 0 0 0 0 1 0 1 0 0 0
8 SCIMED S AY1 M EH2 D 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0
9 MORTIS M AO1 R T IH0 S 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0
10 CONSERVATOR K AH0 N S ER1 V AH0 T ER0 0 2 0 0 0 0 0 0 0 2 0 0 0 0 0
答案 1 :(得分:1)
IIUC:
In [85]: vowels = set(vowels)
In [86]: words['vowel_map'] = \
words['string_of_phon'].str.replace(r'\d+', '').str.split() \
.apply(lambda x: [int(i in vowels) for i in x])
In [87]: words
Out[87]:
word string_of_phon vowel_map
0 COED K OW1 EH2 D [0, 1, 1, 0]
1 PURVIEW P ER1 V Y UW2 [0, 1, 0, 0, 1]
2 HEHIR HH EH1 HH IH0 R [0, 1, 0, 1, 0]
3 MUSCLING M AH1 S AH0 L IH0 NG [0, 1, 0, 1, 0, 1, 0]
4 NONPOISONOUS N AA0 N P OY1 Z AH0 N AH0 S [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]
5 LAVECCHIA L AA0 V EH1 K IY0 AH0 [0, 1, 0, 1, 0, 1, 1]
6 BUCKLED B AH1 K AH0 L D [0, 1, 0, 1, 0, 0]
7 EATEN IY1 T AH0 N [1, 0, 1, 0]
8 SCIMED S AY1 M EH2 D [0, 1, 0, 1, 0]
9 MORTIS M AO1 R T IH0 S [0, 1, 0, 0, 1, 0]
10 CONSERVATOR K AH0 N S ER1 V AH0 T ER0 [0, 1, 0, 0, 1, 0, 1, 0, 1]
现在您可以将计算列分配给另一个对象:
In [88]: word_vowels = words.vowel_map
In [89]: word_vowels
Out[89]:
0 [0, 1, 1, 0]
1 [0, 1, 0, 0, 1]
2 [0, 1, 0, 1, 0]
3 [0, 1, 0, 1, 0, 1, 0]
4 [0, 1, 0, 0, 1, 0, 1, 0, 1, 0]
5 [0, 1, 0, 1, 0, 1, 1]
6 [0, 1, 0, 1, 0, 0]
7 [1, 0, 1, 0]
8 [0, 1, 0, 1, 0]
9 [0, 1, 0, 0, 1, 0]
10 [0, 1, 0, 0, 1, 0, 1, 0, 1]
Name: vowel_map, dtype: object