Question

我正在关注Google Dev Machine学习资料。我正在尝试使用这种ML算法来帮助我解决工作中遇到的问题。我们经常在执行ETL时获得各种不同的日期格式，并希望能够将某些行标识为日期。

当前有效的解决方案是Regex，我想使用ML训练计算机来识别日期。

我在Google Dev中提到的代码就是这个（对象识别）：

import sklearn
from sklearn import tree
#Featuers:  0 = "bumpy" 1 = "smooth"
#Labels:    0 = apple 1 = orange
features = [[140, 1], [130, 1], [150, 0], [170, 0]]
labels = [0, 0, 1, 1]

# We will be using a Decision Tree in this instance
clf = tree.DecisionTreeClassifier()

#fit = This is the training algorithm, this helps identify patterns as 
to what attributes are associated with apples etc

clf = clf.fit(features, labels)

print(clf.predict([[160, 0]])) #This outputs 1, so it believes it is an orange.

我想加载不同日期类型的整列（12/12/12，12月12-12等）和不同的字符串（12 12 12，用户/文档/ Python等）。

下一列将是字符串类型（在代码中转换为0和1作为变量“labels”）：1 =如果字符串是日期，则0 =如果字符串只是一个字符串。

希望我的思路正确。

Answer 1

不，它不接受。您必须通过将它们转换为数字格式来进行特征工程。

例如，

1）对于分类列/特征中的字符串 - 执行单热编码

2）对于日期 - 将它们从current_date

转换为天数

我可以将字符串输入到用于scikit-learn的DecisionTreeClassifiers的“features”中吗？

1 个答案: