Question

我有以下df：

Date       Event_Counts   Category_A  Category_B
20170401      982457          0           1
20170402      982754          1           0
20170402      875786          0           1

我正在准备用于回归分析的数据，并希望标准化Event_Counts列，以便它与类别类似。

我使用以下代码：

from sklearn import preprocessing
df['scaled_event_counts'] = preprocessing.scale(df['Event_Counts'])

虽然我收到了这个警告：

DataConversionWarning: Data with input dtype int64 was converted to float64 by the scale function.
  warnings.warn(msg, _DataConversionWarning)

它似乎有效;有一个新专栏。但是，它有负数，如-1.3

我认为缩放函数的作用是从数字中减去均值，然后除以每行的标准差;然后将结果的min添加到每一行。

这种方式对熊猫不起作用吗？或者我应该使用normalize（）函数还是StandardScaler（）函数？我希望标准化列的比例为0到1.

谢谢

Answer 1

我认为您正在寻找sklearn.preprocessing.MinMaxScaler。这将允许您缩放到给定范围。

所以在你的情况下，它将是：

scaler = preprocessing.MinMaxScaler(feature_range=(0,1))
df['scaled_event_counts'] = scaler.fit_transform(df['Event_Counts'])

缩放整个df：

scaled_df = scaler.fit_transform(df)
print(scaled_df)
[[ 0.          0.99722347  0.          1.        ]
 [ 1.          1.          1.          0.        ]
 [ 1.          0.          0.          1.        ]]

Answer 2

通过减去平均值并除以每个特征（列）的标准差来完成缩放。所以，

<select class="form-control form-big-letters" th:field="*{AllRoles}">
   <option th:each="itemRole, iStat : ${AllRoles}"
   th:if="${not #strings.contains(#strings.toString(#lists.toList(activeSessionsItem.granted_authorities)),#strings.toString(itemRole.role_list_role))}"            th:value="${itemRole.role_list_role}" th:text="${#strings.substringAfter(itemRole.role_list_role,'ROLE_')}">Roles
</option></select>

int64 to float64警告来自必须减去均值，这将是一个浮点数，而不仅仅是一个整数。

您将使用缩放列获得负数，因为均值将归一化为零。

python pandas标准化回归列

2 个答案: