我正在尝试建立决策树回归模型,以预测汽车的MSRP(制造商建议零售价)值。但是,我在将分类值转换为数值时遇到问题。
我的问题: 我有8列分类功能,有些列具有多达40种不同类型的唯一值和20,000个实例。我应该使用哪种方法来转换分类数据以用于决策树回归?还有什么方法可以自动输入唯一值,而不是手动输入?
我尝试使用LabelEncoder转换分类值,但是由于某种原因,即使转换后,第一列中df.values的数组(宝马、,歌...)也没有改变。
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
df = pd.read_excel(r'C:\Users\user\Desktop\data.xlsx')
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
df.values[:, 0] = labelencoder.fit_transform(df.values[:, 0])
这是我得到的结果:
array([['BMW', '1 Series M', 2011, ..., 19, 3916, 46135],
['BMW', '1 Series', 2011, ..., 19, 3916, 40650],
['BMW', '1 Series', 2011, ..., 20, 3916, 36350],
...,
['Acura', 'ZDX', 2012, ..., 16, 204, 50620],
['Acura', 'ZDX', 2013, ..., 16, 204, 50920],
['Lincoln', 'Zephyr', 2006, ..., 17, 61, 28995]], dtype=object)
我希望第一列具有用于DT回归的数值。 有人可以帮忙吗?我正在FYP中这样做,这是我第一次接触机器学习。
答案 0 :(得分:2)
有多种方法可以使用pandas和sklearn将分类数据转换为数字:
- pandas.get_dummies()(一种热门编码)
示例:
import numpy as np
import pandas as pd
df = pd.DataFrame([['BMW', '1 Series M', 2011, 19, 3916, 46135],
['BMW', '1 Series', 2011,19, 3916, 40650],
['BMW', '1 Series', 2011,20, 3916, 36350],
['Acura', 'ZDX', 2012, 16, 204, 50620],
['Acura', 'ZDX', 2013, 16, 204, 50920],
['Lincoln', 'Zephyr', 2006, 17, 61, 28995]]) #Sample dataframe
pd.get_dummies(df, columns = [0,1,2]) #Dummies of 1st,2nd and 3rd column
2。LabelEncoder
示例
import numpy as np
import pandas as pd
from sklearn.preprocessing import LabelEncoder
df = pd.DataFrame([['BMW', '1 Series M', 2011, 19, 3916, 46135],
['BMW', '1 Series', 2011,19, 3916, 40650],
['BMW', '1 Series', 2011,20, 3916, 36350],
['Acura', 'ZDX', 2012, 16, 204, 50620],
['Acura', 'ZDX', 2013, 16, 204, 50920],
['Lincoln', 'Zephyr', 2006, 17, 61, 28995]]) #Sample dataframe
df[[0,1,2]].apply(LabelEncoder().fit_transform)
df.loc[0:,0:2] = df[[0,1,2]].apply(LabelEncoder().fit_transform)
#puts column back into dataframe
答案 1 :(得分:0)
实际上,您是以错误的方式为您分配数据 df.values [:, 0] ,仅尝试 df [:, 0]
java.net.SocketException: Connection reset
at java.net.SocketInputStream.read(SocketInputStream.java:210)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at sun.security.ssl.InputRecord.readFully(InputRecord.java:465)
at sun.security.ssl.InputRecord.read(InputRecord.java:503)
at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975)
at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933)
at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at myproject.getMetadataFromPod(MyClass.java:295)
at myproject.MyClass.lambda$zookeeperData$5(MyClass.java:337)
at io.vertx.core.impl.ContextImpl.lambda$executeBlocking$1(ContextImpl.java:273)
at io.vertx.core.impl.TaskQueue.run(TaskQueue.java:76)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
at java.lang.Thread.run(Thread.java:748)