我正在为Coursera的机器学习:回归课程做作业。我正在使用kc_house_data.gl/ dataset和GraphLab Create。我将新变量添加到train_data和test_data,它们是旧变量的组合。然后我取所有这些变量的均值。这些是我要添加的变量:
bedrooms_squared = bedrooms * bedrooms
bed_bath_rooms = bedrooms*bathrooms
log_sqft_living = log(sqft_living)
lat_plus_long = lat + long
这是我的代码:
train_data['bedrooms_squared'] = train_data['bedrooms'].apply(lambda x: x**2)
test_data['bedrooms_squared'] = test_data['bedrooms'].apply(lambda x: x**2)
# create the remaining 3 features in both TEST and TRAIN data
train_data['bed_bath_rooms'] = train_data.apply(lambda row: row['bedrooms'] * row['bathrooms'])
test_data['bed_bath_rooms'] = test_data.apply(lambda row: row['bedrooms'] * row['bathrooms'])
train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x))
test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x))
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
test_data['bedrooms_squared'].mean()
test_data['bed_bath_rooms'].mean()
test_data['log_sqft_living'].mean()
test_data['lat_plus_long'].mean()
这是我得到的错误:
RuntimeError: Runtime Exception. Exception in python callback function evaluation:
ValueError('math domain error',):
Traceback (most recent call last):
File "graphlab\cython\cy_pylambda_workers.pyx", line 426, in graphlab.cython.cy_pylambda_workers._eval_lambda
File "graphlab\cython\cy_pylambda_workers.pyx", line 169, in graphlab.cython.cy_pylambda_workers.lambda_evaluator.eval_simple
File "<ipython-input-13-1cdbcd5f5d9b>", line 5, in <lambda>
ValueError: math domain error
我不知道这意味着什么。有什么原因导致它以及如何修复它?感谢。
答案 0 :(得分:0)
您的问题是log
收到负数。
log
仅针对大于零的数字定义。
您需要检查自己的价值观。
答案 1 :(得分:0)
请添加/学习例外以使您的代码更加强大:
try:
train_data['log_sqft_living'] = train_data['sqft_living'].apply(lambda x: log(x))
test_data['log_sqft_living'] = test_data['bedrooms'].apply(lambda x: log(x))
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
train_data['lat_plus_long'] = train_data.apply(lambda row: row['lat'] + row['long'])
test_data['bedrooms_squared'].mean()
test_data['bed_bath_rooms'].mean()
test_data['log_sqft_living'].mean()
test_data['lat_plus_long'].mean()
except e as Exception:
print "ERROR in function:", e