需要帮助理解SigmoidCrossEntropyLossLayer的Caffe代码以获得多标签丢失

时间:2017-01-23 06:44:31

标签: deep-learning caffe logistic-regression cross-entropy

我需要帮助理解Caffe函数, from __future__ import print_function import json import urllib import boto3 from collections import Counter from nltk.tokenize import sent_tokenize, word_tokenize from nltk.corpus import stopwords from nltk.tokenize import RegexpTokenizer from nltk.stem.porter import * from nltk.corpus import stopwords from nltk.tokenize import RegexpTokenizer tokenizer = RegexpTokenizer(r'\w+') stemmer=PorterStemmer() import sys reload(sys) sys.setdefaultencoding('utf8') print('Loading function') s3 = boto3.client('s3') number_of_sentences=0 number_of_words=0 word_list=[] stop_words=set(stopwords.words('english')) stop_word_list=[ v for v in stop_words] modal_verbs=['can', 'could', 'may', 'might', 'must', 'shall', 'should', 'will' ,'would','ought'] auxilary_verbs=['be','do','have'] stop_word_list=stop_word_list+modal_verbs+auxilary_verbs print("Starting Trigram generation") #Empty Trigram list tri_gram_list=[] def lambda_handler(event, context): #print("Received event: " + json.dumps(event, indent=2)) # Get the object from the event and show its content type ''' ''' bucket = event['Records'][0]['s3']['bucket']['name'] key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8')) try: response = s3.get_object(Bucket=bucket, Key=key) print("CONTENT TYPE: " + response['ContentType']) text = response['Body'].read() print(type(text)) for line in text.readlines(): for line in open("input.txt","r").readlines(): line=unicode(line, errors='ignore') if len(line)>1: sentences=sent_tokenize(line) number_of_sentences+=len(sentences) for sentence in sentences: sentence=sentence.strip().lower() #sentence = sentence.replace('+', ' ').replace('.', ' ').replace(',', ' ').replace(':', ' ').replace('(', ' ').replace(')', ' ').replace(''`'', ' ').strip().lower() words_from_sentence=tokenizer.tokenize(line) words = [word for word in words_from_sentence if word not in stop_word_list] number_of_words+=len(words) stemmed_words = [stemmer.stem(word) for word in words] word_list.extend(stemmed_words) #generate Trigrams tri_gram_list_t= [ " ".join([words[index],words[index+1],words[index+2]]) for index,value in enumerate(words) if index<len(words)-2] #print tri_gram_list tri_gram_list.extend(tri_gram_list_t) print number_of_words print number_of_sentences print("Conting frequency now...") count=Counter() for element in tri_gram_list: #print element, type(tri_gram_list) count[element]=count[element]+1 print count.most_common(25) print "most common 25 words ARE:" for element in word_list: #print element, type(tri_gram_list) count[element]=count[element]+1 print count.most_common(25) # body = obj.get()['Body'].read() except Exception as e: print(e) print('Error getting object {} from bucket {}. Make sure they exist and your bucket is in the same region as this function.'.format(key, bucket)) raise e ,这是逻辑激活时的交叉熵错误。

基本上,具有N个独立目标的单个示例的交叉熵误差表示为:

SigmoidCrossEntropyLossLayer

其中 - sum-over-N( t[i] * log(x[i]) + (1 - t[i]) * log(1 - x[i] ) 是目标,0或1,t是输出,由x索引。 i当然要经过后勤激活。

更快的交叉熵计算的代数技巧将计算减少到:

x

您可以从第3节here验证。

问题是,如何将上述内容转换为下面的损失计算代码:

 -t[i] * x[i] + log(1 + exp(x[i])) 

谢谢。

为方便起见,下面转载了该功能。

   loss -= input_data[i] * (target[i] - (input_data[i] >= 0)) -
        log(1 + exp(input_data[i] - 2 * input_data[i] * (input_data[i] >= 0)));

1 个答案:

答案 0 :(得分:2)

在表达式log(1 + exp(x[i]))中,如果x[i]非常大,您可能会遇到数值不稳定。为了克服这种数值不稳定性,可以像这样对S形函数进行扩展:

 sig(x) = exp(x)/(1+exp(x)) 
        = [exp(x)*exp(-x(x>=0))]/[(1+exp(x))*exp(-x(x>=0))]

现在,如果你将sig(x)的新的稳定表达式插入到损失中,你将得到与caffe正在使用的表达式相同的表达式。

享受!