我正在尝试将LUIS架构模型导入RASA并尝试使用spacy + scikit管道进行训练。我正在使用RASA NLU v0.10.4
但是当我尝试加载LUIS模型架构时,ner_crf组件会抛出未对齐的实体注释警告。
虽然我已在LUIS模型架构中正确标记了实体。
这是我的配置文件:
{
"project": "SynonymsExample",
"path": "C:\\Users\\xyz\\Desktop\\RASA\\models",
"response_log": "C:\\Users\\xyz\\Desktop\\RASA\\logs",
"pipeline": "spacy_sklearn",
"data": "C:\\Users\\xyz\\Desktop\\RASA\\data\\examples\\RasaFormat.json",
"cors_origins": ["*"],
"aws_endpoint_url": null,
"token": null,
"num_threads": 2,
"port": 5000
}
这是我的LUIS模型
{
"luis_schema_version": "2.1.0",
"versionId": "0.1",
"name": "phraseListDemo",
"desc": "",
"culture": "en-us",
"intents": [
{
"name": "None"
},
{
"name": "PersonalInfo"
}
],
"entities": [
{
"name": "city"
},
{
"name": "Contact"
},
{
"name": "Email"
},
{
"name": "FirstName"
},
{
"name": "LastName"
}
],
"composites": [],
"closedLists": [],
"bing_entities": [
"datetimeV2"
],
"actions": [],
"model_features": [
{
"name": "city",
"mode": true,
"words": "jaipur,bangalore,florida,japan,delhi,pune,bombay,mumbai,chennai,hyderabad,kolkata,chandigarh,ahmedabad,china,lucknow,germany,noida,indore,nagpur,coimbatore,bhopal,banglore,india,patna,maharashtra,surat,kanpur,guwahati,ludhiana,gwalior,aurangabad,amritsar,rajkot,gujarat,madurai,pradesh,dehradun,raipur,ranchi,varanasi,jabalpur,jodhpur,srinagar,mangalore,udaipur,jamshedpur,vadodara",
"activated": true
},
{
"name": "contact",
"mode": true,
"words": "8947847422,8967564556,8967907890,1235712345,8989898989,1231231231",
"activated": true
},
{
"name": "Email",
"mode": true,
"words": "xyz@email.com, abc@gmail.com",
"activated": true
},
{
"name": "emailid",
"mode": true,
"words": "xyz@email.com, abc@gmail.com",
"activated": true
},
{
"name": "FirstName",
"mode": true,
"words": "amit,ankur,ankit,ram,shyam,kunal,saikat,sundar,krishna,vikram,mohan,vijay,karthik,sunil,vivek,gopal,John,Chris,satish,surya,ajay,raju,suresh,sanjay,rajesh,ravi,ramesh,arun,rakesh,manoj,anil,kiran,sachin,dinesh,pradeep,raj,ashok,priya,prakash,david,mukesh,praveen,mahesh,naresh,anand,kumar,nikhil,michael,paul,naveen,nitin,srinivas,prasad,vinod,kishore,james,vinay,thomas",
"activated": true
},
{
"name": "LastName",
"mode": true,
"words": "Gupta,Sharma,Jain,kumar,singh,mishra,Mukherjee,goswami,verma,yadav,patel,ghosh,das",
"activated": true
},
{
"name": "MID",
"mode": true,
"words": "M1039205,M1039222,M1036767,M1048967,M1056789,M1028967,M1088967",
"activated": true
}
],
"regex_features": [],
"utterances": [
{
"text": "my name is ankur",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"startPos": 11,
"endPos": 15
}
]
},
{
"text": "my contact number is 1231234123",
"intent": "PersonalInfo",
"entities": [
{
"entity": "Contact",
"startPos": 21,
"endPos": 30
}
]
},
{
"text": "my firstname is amit and lastname is gupta",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"startPos": 16,
"endPos": 19
},
{
"entity": "LastName",
"startPos": 37,
"endPos": 41
}
]
},
{
"text": "my email is a@gmail.com",
"intent": "PersonalInfo",
"entities": [
{
"entity": "Email",
"startPos": 12,
"endPos": 22
}
]
},
{
"text": "kunal is one person",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"startPos": 0,
"endPos": 4
}
]
},
{
"text": "myself singh and my dob comes on 24 may",
"intent": "PersonalInfo",
"entities": [
{
"entity": "LastName",
"startPos": 7,
"endPos": 11
}
]
},
{
"text": "my name is gupta and my dob is in month april",
"intent": "PersonalInfo",
"entities": [
{
"entity": "LastName",
"startPos": 11,
"endPos": 15
}
]
},
{
"text": "my name is amit and my date of birth is in month of march",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"startPos": 11,
"endPos": 14
}
]
}
]
}
有人能说出我错的地方吗?
更新 这是我的RASA格式培训数据
{
"rasa_nlu_data": {
"entity_synonyms": [
{
"value": "city",
"synonyms": [
"jaipur",
"bangalore",
"florida",
"japan",
"delhi",
"pune",
"bombay",
"mumbai",
"chennai",
"hyderabad",
"kolkata",
"chandigarh",
"ahmedabad",
"china",
"lucknow",
"germany",
"noida",
"indore",
"nagpur",
"coimbatore",
"bhopal",
"banglore",
"india",
"patna",
"maharashtra",
"surat",
"kanpur",
"guwahati",
"ludhiana",
"gwalior",
"aurangabad",
"amritsar",
"rajkot",
"gujarat",
"madurai",
"pradesh",
"dehradun",
"raipur",
"ranchi",
"varanasi",
"jabalpur",
"jodhpur",
"srinagar",
"mangalore",
"udaipur",
"jamshedpur",
"vadodara"
]
},
{
"value": "contact",
"synonyms": [
"8947847422",
"8967564556",
"8967907890",
"1235712345",
"8989898989",
"1231231231"
]
},
{
"value": "Email",
"synonyms": [
"xyz@email.com",
" abc@gmail.com"
]
},
{
"value": "emailid",
"synonyms": [
"xyz@email.com",
" abc@gmail.com"
]
},
{
"value": "FirstName",
"synonyms": [
"amit",
"ankur",
"ankit",
"ram",
"shyam",
"kunal",
"saikat",
"sundar",
"krishna",
"vikram",
"mohan",
"vijay",
"karthik",
"sunil",
"vivek",
"gopal",
"John",
"Chris",
"satish",
"surya",
"ajay",
"raju",
"suresh",
"sanjay",
"rajesh",
"ravi",
"ramesh",
"arun",
"rakesh",
"manoj",
"anil",
"kiran",
"sachin",
"dinesh",
"pradeep",
"raj",
"ashok",
"priya",
"prakash",
"david",
"mukesh",
"praveen",
"mahesh",
"naresh",
"anand",
"kumar",
"nikhil",
"michael",
"paul",
"naveen",
"nitin",
"srinivas",
"prasad",
"vinod",
"kishore",
"james",
"vinay",
"thomas"
]
},
{
"value": "LastName",
"synonyms": [
"Gupta",
"Sharma",
"Jain",
"kumar",
"singh",
"mishra",
"Mukherjee",
"goswami",
"verma",
"yadav",
"patel",
"ghosh",
"das"
]
},
{
"value": "MID",
"synonyms": [
"M1039205",
"M1039222",
"M1036767",
"M1048967",
"M1056789",
"M1028967",
"M1088967"
]
}
],
"regex_features": [],
"common_examples": [
{
"text": "my name is ankur",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"value": "ankur",
"start": 11,
"end": 15
}
]
},
{
"text": "my contact number is 1231234123",
"intent": "PersonalInfo",
"entities": [
{
"entity": "Contact",
"value": "1231234123",
"start": 21,
"end": 30
}
]
},
{
"text": "my firstname is amit and lastname is gupta",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"value": "amit",
"start": 16,
"end": 19
},
{
"entity": "LastName",
"value": "gupta",
"start": 37,
"end": 41
}
]
},
{
"text": "my email is a@gmail.com",
"intent": "PersonalInfo",
"entities": [
{
"entity": "Email",
"value": "a@gmail.com",
"start": 12,
"end": 22
}
]
},
{
"text": "kunal is one person",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"value": "kunal",
"start": 0,
"end": 4
}
]
},
{
"text": "myself singh and my dob comes on 24 may",
"intent": "PersonalInfo",
"entities": [
{
"entity": "LastName",
"value": "singh",
"start": 7,
"end": 11
}
]
},
{
"text": "my name is gupta and my dob is in month april",
"intent": "PersonalInfo",
"entities": [
{
"entity": "LastName",
"value": "gupta",
"start": 11,
"end": 15
}
]
},
{
"text": "my name is amit and my date of birth is in month of march",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"value": "amit",
"start": 11,
"end": 14
}
]
}
]
}
}
答案 0 :(得分:2)
正如警告消息所指出的那样,start
和end
可能设置错误,导致在令牌边界(开始或结束)中包含一些空格。
例如,这样的句子(来自你的luis模型)
{
"text": "kunal is one person",
"intent": "PersonalInfo",
"entities": [
{
"entity": "FirstName",
"startPos": 0,
"endPos": 4
}
]
},
可能(错误地)使start
成为1
而end
成为5
的培训数据。
也许尝试使用Rasa NLU Trainer来查看训练数据并查看是否是这种情况?
这也发生在我身上。更正start
和end
数字会修复它。