我刚开始学习Python。我使用API来构建IDF模型,但是我面临一些我无法解决的lambda函数的错误。 这是生成IDF的类:
class Idfs(DocumentFrequencies, Model):
def build(self, corpus):
log.info('Counting documents in corpus...')
N = float(corpus.count())
dfs = super(Idfs, self).build(corpus)
log.info('Building idf model: N=%i', N)
return dfs\
.map(lambda (term, (df,rank)): (term, df))\
.mapValues(lambda df: math.log(N/df))
@staticmethod
def format_item((term, idf)):
return {
'_id': term,
'idf': idf,
}
这是计算DF的类:
class DocumentFrequencies(ModelBuilder):
def __init__(self, lowercase=False, max_ngram=1, min_df=2):
self.lowercase = lowercase
self.max_ngram = max_ngram
self.min_df = min_df
def build(self, docs):
m = docs.map(lambda d: d['text'])
if self.lowercase:
m = m.map(lambda text: text.lower())
return m\
.flatMap(lambda text: set(ngrams(text, self.max_ngram)))\
.map(lambda t: (t, 1))\
.reduceByKey(add)\
.filter(lambda (k,v): v > self.min_df)
此行.map(lambda (term, (df, rank)): (term, df))\
中出现错误,这是错误消息:
TypeError: 'int' object is not iterable
这是我拨打DocumentFrequencies.collect()
时所得到的:
Out[5]:
[(u'fawn', 3),
(u'1,800', 31),
(u'clotted', 3),
(u'comically', 11),
(u'Adjusting', 3),
(u'O(log', 6),
(u'unnecessarily', 15),
(u'evangelical', 53),
(u'naturopathic', 3),
(u'grenadiers', 4),
(u'stipulate', 4),
(u'Vikrant', 3),
(u'fractal', 18),
我不知道究竟哪个参数导致错误。我使用的是python 2.7,8 GB 1600 MHz DDR和2个内核。这些是pyspark配置:
conf = pyspark.SparkConf().setAll([('spark.executor.memory', '8g'),('spark.driver.memory','8g'),('spark.network.timeout','100000000s'),('spark.executor.heartbeatInterval','10000000s'),('spark.driver.maxResultSize','8g'),('spark.driver.cores','2')])
提前致谢,
答案 0 :(得分:2)
基于Route::group([
'middleware' => [
'auth:api']], function() {
Route::post('/banking/transactions', 'TransactionController@store');
Route::get('/banking/accounts', 'BankAccountDirectoryController@index');
Route::get('/accounts/{account}', 'BankAccountDirectoryController@show');
Route::get('/banking/accounts/search/{term?}', 'BankAccountDirectoryController@search');
});
输出,
DocumentFrequencies.collect()
不应该存在。基本上它试图将给定的元组map(lambda (term, (df,rank)): (term, df))
转换为两部分。 (u'fawn', 3)
已映射到u'fawn'
,term
已映射到3
。由于整数3不能转换为元组(可迭代),因此错误消息
(df,rank)
删除此行不会更改dfs中的任何内容。