我的数据结构如下:
/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/utils/validation.py:386: DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and willraise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.
DeprecationWarning)
[2016-07-27 12:45:30,747] ERROR in app: Exception on /prediction/results [POST]
Traceback (most recent call last):
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1988, in wsgi_app
response = self.full_dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1641, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1544, in handle_user_exception
reraise(exc_type, exc_value, tb)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1639, in full_dispatch_request
rv = self.dispatch_request()
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/flask/app.py", line 1625, in dispatch_request
return self.view_functions[rule.endpoint](**req.view_args)
File "app.py", line 95, in predict
output_acute_bronchitis = model_acute_bronchitis.predict(input_list)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 65, in predict
jll = self._joint_log_likelihood(X)
File "/Users/Vanessa/Desktop/User/lib/python2.7/site-packages/sklearn/naive_bayes.py", line 394, in _joint_log_likelihood
n_ij -= 0.5 * np.sum(((X - self.theta_[i, :]) ** 2) /
TypeError: ufunc 'subtract' did not contain a loop with signature matching types dtype('<U32') dtype('<U32') dtype('<U32')
127.0.0.1 - - [27/Jul/2016 12:45:30] "POST /prediction/results HTTP/1.1" 500 -
在这些数据中,有些人被反复测量过。
我希望重新构建数据,以便每个功能 - 时间组合都是一个独特的列,如下所示:
Group, ID, Time, Feat1, Feat2, Feat3
A, 1, 0, 1.52, 2.94, 3.1
A, 1, 2, 1.67, 2.99, 3.3
A, 1, 4, 1.9, 3.34, 5.6
有没有一种简单的方法来处理它,而不使用for循环?我已经尝试用for-loop方法完成我需要的东西,但是它不够优雅和笨重,并且给出了10个 4 列的真实数据,它也需要一段时间。
答案 0 :(得分:1)
df = pd.DataFrame({'Group': {0: 'A', 1: 'A', 2: 'A', 3: 'A', 4: 'A', 5: 'A'},
'Time': {0: 0, 1: 2, 2: 4, 3: 0, 4: 2, 5: 4},
'ID': {0: 1, 1: 1, 2: 1, 3: 2, 4: 2, 5: 2},
'Feat1': {0: 1.52, 1: 1.6699999999999999, 2: 1.8999999999999999, 3: 1.52, 4: 1.6699999999999999, 5: 1.8999999999999999},
'Feat3': {0: 3.1000000000000001, 1: 3.2999999999999998, 2: 5.5999999999999996, 3: 3.1000000000000001, 4: 3.2999999999999998, 5: 5.5999999999999996},
'Feat2': {0: 2.9399999999999999, 1: 2.9900000000000002, 2: 3.3399999999999999, 3: 2.9399999999999999, 4: 2.9900000000000002, 5: 3.3399999999999999}})
df1 = df.set_index(['Group', 'ID', 'Time']).unstack()
df1
df1.columns = df1.columns.to_series().apply(pd.Series).astype(str).T.apply('_'.join)
df1.reset_index()