我正在使用SpaCy库,我需要将数据集转换为以下预期输出。 但是,我得到此输出,但只填充了一堆零。语法是:
[('word', {'cats': {'label_1': 0, 'label_2': 1, ... }})]
预期输出
[
('hug',
{'cats': {'anger': 0,
'anticipation': 0,
'disgust': 0,
'fear': 0,
'joy': 1,
'negative': 0,
'positive': 0,
'sadness': 0}}),
('cry',
{'cats': {'anger': 0,
'anticipation': 0,
'disgust': 0,
'fear': 0,
'joy': 0,
'negative': 0,
'positive': 0,
'sadness': 1}}),
...
]
此功能将遍历标签列表
def cat_dict_funct(cat_dict, lst, n):
for i in range(8):
if i == n:
cat_dict[lst[i]] = 1
else:
cat_dict[lst[i]] = 0
初始化数据和标签
train_data = df
train_labels = list(set(df.category))
['anger',
'fear',
'disgust',
'positive',
'sadness',
'anticipation',
'joy',
'negative',
'surprise',
'trust']
遍历每个标签并以正确的顺序附加项目
train_texts = train_data['word'].tolist()
train_cats = train_data['category'].tolist()
final_train_cats, cat_dict = [], {}
for cat in train_cats:
if cat == 'anger':
cat_dict_funct(cat_dict, train_labels, 0)
elif cat == 'fear':
cat_dict_funct(cat_dict, train_labels, 1)
elif cat == 'disgust':
cat_dict_funct(cat_dict, train_labels, 2)
elif cat == 'positive':
cat_dict_funct(cat_dict, train_labels, 3)
elif cat == 'sadness':
cat_dict_funct(cat_dict, train_labels, 4)
elif cat == 'anticipation':
cat_dict_funct(cat_dict, train_labels, 5)
elif cat == 'joy':
cat_dict_funct(cat_dict, train_labels, 6)
elif cat == 'negative':
cat_dict_funct(cat_dict, train_labels, 7)
elif cat == 'surprise':
cat_dict_funct(cat_dict, train_labels, 8)
elif cat == 'trust':
cat_dict_funct(cat_dict, train_labels, 9)
final_train_cats.append(cat_dict)
压缩并列出收集的项目
TRAIN_DATA = list(zip(train_texts, [{"cats": cats} for cats in final_train_cats]))