Question

我已经从以下位置下载并标记了数据 http://archive.ics.uci.edu/ml/datasets/pamap2+physical+activity+monitoring

我的任务是从给出的数据中洞悉数据，我在一个数据框中拥有第34个属性（所有区域均无nan值）

，并希望在给定其余属性（所有都是执行各种活动的参与者的人数）的情况下，基于一个目标属性'heart_rate'训练模型

我想使用线性回归模型，但由于某种原因不能使用数据框，但是，如果您认为我做错了，我不介意从0开始

我的DataFrame列：

> Index(['timestamp', 'activity_ID', 'heart_rate', 'IMU_hand_temp',
>        'hand_acceleration_16_1', 'hand_acceleration_16_2',
>        'hand_acceleration_16_3', 'hand_gyroscope_rad_7',
>        'hand_gyroscope_rad_8', 'hand_gyroscope_rad_9',
>        'hand_magnetometer_μT_10', 'hand_magnetometer_μT_11',
>        'hand_magnetometer_μT_12', 'IMU_chest_temp', 'chest_acceleration_16_1',
>        'chest_acceleration_16_2', 'chest_acceleration_16_3',
>        'chest_gyroscope_rad_7', 'chest_gyroscope_rad_8',
>        'chest_gyroscope_rad_9', 'chest_magnetometer_μT_10',
>        'chest_magnetometer_μT_11', 'chest_magnetometer_μT_12',
>        'IMU_ankle_temp', 'ankle_acceleration_16_1', 'ankle_acceleration_16_2',
>        'ankle_acceleration_16_3', 'ankle_gyroscope_rad_7',
>        'ankle_gyroscope_rad_8', 'ankle_gyroscope_rad_9',
>        'ankle_magnetometer_μT_10', 'ankle_magnetometer_μT_11',
>        'ankle_magnetometer_μT_12', 'Intensity'],
>       dtype='object')

前5行：

timestamp   activity_ID heart_rate  IMU_hand_temp   hand_acceleration_16_1  hand_acceleration_16_2  hand_acceleration_16_3  hand_gyroscope_rad_7    hand_gyroscope_rad_8    hand_gyroscope_rad_9    ... ankle_acceleration_16_1 ankle_acceleration_16_2 ankle_acceleration_16_3 ankle_gyroscope_rad_7   ankle_gyroscope_rad_8   ankle_gyroscope_rad_9   ankle_magnetometer_μT_10    ankle_magnetometer_μT_11    ankle_magnetometer_μT_12    Intensity
2928    37.66   lying   100.0   30.375  2.21530 8.27915 5.58753 -0.004750   0.037579    -0.011145   ... 9.73855 -1.84761    0.095156    0.002908    -0.027714   0.001752    -61.1081    -36.8636    -58.3696    low
2929    37.67   lying   100.0   30.375  2.29196 7.67288 5.74467 -0.171710   0.025479    -0.009538   ... 9.69762 -1.88438    -0.020804   0.020882    0.000945    0.006007    -60.8916    -36.3197    -58.3656    low
2930    37.68   lying   100.0   30.375  2.29090 7.14240 5.82342 -0.238241   0.011214    0.000831    ... 9.69633 -1.92203    -0.059173   -0.035392   -0.052422   -0.004882   -60.3407    -35.7842    -58.6119    low
2931    37.69   lying   100.0   30.375  2.21800 7.14365 5.89930 -0.192912   0.019053    0.013374    ... 9.66370 -1.84714    0.094385    -0.032514   -0.018844   0.026950    -60.7646    -37.1028    -57.8799    low
2932    37.70   lying   100.0   30.375  2.30106 7.25857 6.09259 -0.069961   -0.018328   0.004582    ... 9.77578 -1.88582    0.095775    0.001351    -0.048878   -0.006328   -60.2040    -37.1225    -57.8847    low

如果您检查timestamp属性，您会看到所获取的数据以毫秒为单位，因此，最好每隔2-5秒使用一次此数据帧中的数据并训练模型

作为一种选择，我想使用线性，多项式，多重线性，凝聚聚类和kmeans聚类作为该任务的这些模型之一。

我的代码：

target = subject1.DataFrame(data.target, columns=["heart_rate"])
X = df
y = target[“heart_rate”]
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]

错误：

AttributeError                            Traceback (most recent call last)
<ipython-input-93-b0c3faad3a98> in <module>()
      3 #heart_rate
      4 # Put the target (housing value -- MEDV) in another DataFrame
----> 5 target = subject1.DataFrame(data.target, columns=["heart_rate"])

c:\python36\lib\site-packages\pandas\core\generic.py in __getattr__(self, name)
   5177             if self._info_axis._can_hold_identifiers_and_holds_name(name):
   5178                 return self[name]
-> 5179             return object.__getattribute__(self, name)
   5180 
   5181     def __setattr__(self, name, value):

AttributeError: 'DataFrame' object has no attribute 'DataFrame'

用于修复我使用的错误：

subject1.columns = subject1.columns.str.strip()

但仍然无法正常工作

谢谢，抱歉，如果我不够精确。

Answer 1

尝试一下：

X = df.drop("heart_rate", axis=1)
y = df[[“heart_rate”]]
X=X.apply(zscore)
test_size=0.30
seed=7
X_train, X_test, y_train, y_test=train_test_split(X, y, test_size=test_size, random_state=seed)
lm = linear_model.LinearRegression()
model = lm.fit(X,y)
predictions = lm.predict(X)
print(predictions)[0:5]

我可以使用哪种类型的模型来训练数据

1 个答案: