我正在考虑在python中进行关键字提取堆栈交换问题的项目。 我有来自kaggle.com的输入数据,其中包含id,title,body和用于训练的标签。 我正在考虑实现一些机器学习算法,如SVM,神经网络等,以训练分类器。 问题是输入这些算法我们需要的功能。 我不知道如何从这个输入中提取这些算法的特征,因为我从未从之前的段落中提取过特征。 任何帮助将不胜感激。
答案 0 :(得分:0)
特征选择至关重要,它为您的问题提供了特征相关性的信息。在Sergios Theodoridis和Konstantinos Koutroumbas的图书识别中给出了很好的理论解释。 我找到了这个简单的代码示例
-(void)postServieCalling :(NSString*)mainurl :(NSString*)params{
NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:[NSString stringWithFormat:mainurl,ServerBaseURL]]
cachePolicy:NSURLRequestUseProtocolCachePolicy
timeoutInterval:60.0];
[request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];
[request setValue:@"application/json" forHTTPHeaderField:@"Accept"];
[request setHTTPMethod:@"POST"];
[request setHTTPBody:[params dataUsingEncoding:NSUTF8StringEncoding]];
NSURLSessionTask *task = [[NSURLSession sharedSession] dataTaskWithRequest:request completionHandler:^(NSData *data, NSURLResponse *response, NSError *error) {
dispatch_async(dispatch_get_main_queue(), ^{
});
if (error) {
NSLog(@"dataTaskWithRequest error: %@", error);
NSString * BasicnetworkError = [error localizedDescription];
NSString * AppendString = @"Http Response failed with the following";
NSString * networkError = [AppendString stringByAppendingString:BasicnetworkError];
[self BasicError1:networkError];
}
else if ([response isKindOfClass:[NSHTTPURLResponse class]]) {
NSInteger statusCode = [(NSHTTPURLResponse *)response statusCode];
if (statusCode != 200) {
NSError *parseError;
id responseObject = [NSJSONSerialization JSONObjectWithData:data options:0 error:&parseError];
[self MainService:responseObject];
}else{
NSError *parseError;
id responseObject = [NSJSONSerialization JSONObjectWithData:data options:0 error:&parseError];
NSLog(@"else condtion");
if (!responseObject) {
NSLog(@"JSON parse error: %@", parseError);
NSLog(@"responseobject is%@",responseObject);
} else {
NSLog(@"responseobject is %@",responseObject);
[self MainService:responseObject];
}
//if response was text/html, you might convert it to a string like so:
NSString *responseString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
NSLog(@"final responseString = %@", responseString);
}
}
}];
[task resume];
}
结果
# Feature Importance
from sklearn import datasets
from sklearn import metrics
from sklearn.ensemble import ExtraTreesClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit an Extra Trees model to the data
model = ExtraTreesClassifier()
model.fit(dataset.data, dataset.target)
# display the relative importance of each attribute
print(model.feature_importances_)
您可以阅读更多[http://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/],with示例。
答案 1 :(得分:0)
许多关键字提取算法都基于经典统计技术(包括图形模型)。流行的功能主要是基于频率的。还存在一些排序单词的算法。 如需进一步研究,请考虑本文: