提取标签预测项目的功能

时间:2016-02-18 12:54:46

标签: python algorithm machine-learning svm text-mining

我正在考虑在python中进行关键字提取堆栈交换问题的项目。 我有来自kaggle.com的输入数据,其中包含id,title,body和用于训练的标签。 我正在考虑实现一些机器学习算法,如SVM,神经网络等,以训练分类器。 问题是输入这些算法我们需要的功能。 我不知道如何从这个输入中提取这些算法的特征,因为我从未从之前的段落中提取过特征。 任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:0)

特征选择至关重要,它为您的问题提供了特征相关性的信息。在Sergios Theodoridis和Konstantinos Koutroumbas的图书识别中给出了很好的理论解释。 我找到了这个简单的代码示例

-(void)postServieCalling :(NSString*)mainurl :(NSString*)params{

    NSMutableURLRequest *request = [NSMutableURLRequest requestWithURL:[NSURL URLWithString:[NSString stringWithFormat:mainurl,ServerBaseURL]]

                                                           cachePolicy:NSURLRequestUseProtocolCachePolicy

                                                       timeoutInterval:60.0];

    [request setValue:@"application/json" forHTTPHeaderField:@"Content-Type"];
    [request setValue:@"application/json" forHTTPHeaderField:@"Accept"];
    [request setHTTPMethod:@"POST"];
    [request setHTTPBody:[params dataUsingEncoding:NSUTF8StringEncoding]];

    NSURLSessionTask *task = [[NSURLSession sharedSession] dataTaskWithRequest:request completionHandler:^(NSData *data, NSURLResponse *response, NSError *error) {

        dispatch_async(dispatch_get_main_queue(), ^{

        });

        if (error) {

            NSLog(@"dataTaskWithRequest error: %@", error);

            NSString * BasicnetworkError = [error localizedDescription];
            NSString * AppendString = @"Http Response failed with the following";
            NSString * networkError = [AppendString stringByAppendingString:BasicnetworkError];

            [self BasicError1:networkError];

        }

        else if ([response isKindOfClass:[NSHTTPURLResponse class]]) {

            NSInteger statusCode = [(NSHTTPURLResponse *)response statusCode];

            if (statusCode != 200) {

                NSError *parseError;
                id responseObject = [NSJSONSerialization JSONObjectWithData:data options:0 error:&parseError];
                [self MainService:responseObject];

            }else{

                NSError *parseError;

                id responseObject = [NSJSONSerialization JSONObjectWithData:data options:0 error:&parseError];

                NSLog(@"else condtion");

                if (!responseObject) {

                    NSLog(@"JSON parse error: %@", parseError);

                    NSLog(@"responseobject is%@",responseObject);

                } else {


                    NSLog(@"responseobject is %@",responseObject);

                    [self MainService:responseObject];
                }

                //if response was text/html, you might convert it to a string like so:

                NSString *responseString = [[NSString alloc] initWithData:data encoding:NSUTF8StringEncoding];
                NSLog(@"final responseString = %@", responseString);
            }
        }
    }];

    [task resume];
}

结果

# Feature Importance
from sklearn import datasets
from sklearn import metrics
from sklearn.ensemble import ExtraTreesClassifier
# load the iris datasets
dataset = datasets.load_iris()
# fit an Extra Trees model to the data
model = ExtraTreesClassifier()
model.fit(dataset.data, dataset.target)
# display the relative importance of each attribute
print(model.feature_importances_)

您可以阅读更多[http://machinelearningmastery.com/feature-selection-in-python-with-scikit-learn/],with示例。

答案 1 :(得分:0)

许多关键字提取算法都基于经典统计技术(包括图形模型)。流行的功能主要是基于频率的。还存在一些排序单词的算法。 如需进一步研究,请考虑本文:

  

http://www.hlt.utdallas.edu/~saidul/acl14.pdf