Question

嘿，我从波士顿数据集创建数据框时遇到问题（可以在这里找到：https://archive.ics.uci.edu/ml/datasets/Housing）

所以这是我的代码：

data1 = DataFrame(data= np.c_[boston['data'], boston['target']],
                     columns= boston['feature_names']+ ['Price'])

类似的代码适用于不同的数据集（即'Iris'数据集）但是现在它返回typerror：

TypeError: ufunc 'add' did not contain a loop with signature matching types dtype('<U7') dtype('<U7') dtype('<U7')

这个有什么问题，如何调整呢？谢谢！

编辑：我弄清楚出了什么问题，feature_names是一个不是列表的数组，所以我必须把它转换成一个列表，它运行正常。以下是感兴趣的代码：

data1 = DataFrame(data= np.c_[boston['data'], boston['target']],
                     columns= (boston['feature_names']).tolist()+ ['Price'])

Answer 1

我认为你需要read_fwf：

cols = ['col1','col2','col3','col4','col5','col6','col7',
        'col8','col9','col10','col11','col12','col13','col14']
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data'
df = pd.read_fwf(url, header=None, names=cols)

print (df.head())
      col1  col2  col3  col4   col5   col6  col7    col8  col9  col10  col11  \
0  0.00632  18.0  2.31     0  0.538  6.575  65.2  4.0900     1  296.0   15.3   
1  0.02731   0.0  7.07     0  0.469  6.421  78.9  4.9671     2  242.0   17.8   
2  0.02729   0.0  7.07     0  0.469  7.185  61.1  4.9671     2  242.0   17.8   
3  0.03237   0.0  2.18     0  0.458  6.998  45.8  6.0622     3  222.0   18.7   
4  0.06905   0.0  2.18     0  0.458  7.147  54.2  6.0622     3  222.0   18.7   

    col12  col13  col14  
0  396.90   4.98   24.0  
1  396.90   9.14   21.6  
2  392.83   4.03   34.7  
3  394.63   2.94   33.4  
4  396.90   5.33   36.2

如果需要过滤列添加参数usecols：

df = pd.read_fwf(url, header=None, names=cols, usecols=['col10','col13'])

print (df.head())
   col10  col13
0  296.0   4.98
1  242.0   9.14
2  242.0   4.03
3  222.0   2.94
4  222.0   5.33

您的代码不是100％所需，可能会将2列相加，然后按subset过滤列：

boston['new'] = boston['feature_names'] + boston['Price']
df = boston[['data', 'target', 'new']].copy()

从dataset loop func创建数据帧

1 个答案: