使用特定数据类型在Graph lab中导入csv

时间:2016-04-23 11:41:09

标签: csv graphlab

在graphlab中,我遇到了以下问题:

 feat1 = gl.SFrame.read_csv(dir_path + '/data/' + 'file_1.csv')

'feat1'包​​含一个名为'movieId'的列,类型为'int'。

feat1.dtype 
0   float
1   float
2   float
3   float
4   float
5   float
6   float
7   float
8   float
9   float
10  float
11  float
12  float
13  float
14  float
15  float
16  float
17  float
18  float
19  float
20  float
21  float
22  float
23  float
24  float
25  float
26  float
27  float
28  float
29  float
30  float
31  float
32  float
33  float
34  float
35  float
36  float
37  float
38  float
39  float
40  float
41  float
42  float
43  float
44  float
45  float
46  float
47  float
48  float
49  float
50  float
51  float
52  float
53  float
54  float
55  float
56  float
57  float
58  float
59  float
60  float
61  float
62  float
63  float
64  float
65  float
66  float
67  float
68  float
69  float
70  float
71  float
72  float
73  float
74  float
75  float
76  float
77  float
78  float
79  float
80  float
81  float
82  float
83  float
84  float
85  float
86  float
87  float
88  float
89  float
90  float
91  float
92  float
93  float
94  float
95  float
96  float
97  float
98  float
99  float
100 float
101 float
102 float
103 float
104 float
105 float
106 float
107 float
108 float
109 float
110 float
111 float
112 float
113 float
114 float
115 float
116 float
117 float
118 float
119 float
120 float
121 float
122 float
123 float
124 float
125 float
126 float
127 float
128 float
129 float
130 float
131 float
132 float
133 float
134 float
135 float
136 float
137 float
138 float
139 float
140 float
141 float
142 float
143 float
144 float
145 float
146 float
147 float
148 float
149 float
150 float
151 float
152 float
153 float
154 float
155 float
156 float
157 float
158 float
159 float
160 float
161 float
162 float
163 float
164 float
165 float
166 float
167 float
168 float
169 float
170 float
171 float
172 float
173 float
174 float
175 float
176 float
177 float
178 float
179 float
180 float
181 float
182 float
183 float
184 float
185 float
186 float
187 float
188 float
189 float
190 float
191 float
192 float
193 float
194 float
195 float
196 float
197 float
198 float
199 float
movieId int

另一方面,存在一个标题名为'movieId'的SFrame,并输入'str'

movieIds.dtype

  <bound method SFrame.dtype of Columns:
movieId str

Rows: 13140

Data:
+---------+
| movieId |
+---------+
|    1    |
|    2    |
|    3    |
|    4    |
|    5    |
|    6    |
|    7    |
|    8    |
|    9    |
|    10   |
+---------+
[13140 rows x 1 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.>

尝试加入时我收到此错误:

feat1 = movieIds.join(feat1, on='movieId', how='inner')

RuntimeError: Runtime Exception. Columns movieId and movieId do not have the same type in both SFrames.

如何控制导入的'csv'的一列以使用特定数据类型导入?在您看来,克服此问题的最佳方法是什么?非常感谢您的评论。

1 个答案:

答案 0 :(得分:1)

您可以首先将sf["movieId"] SArray的dtype从string更改为float。然后你可以再次尝试加入。请遵循此示例(其中sf有一个名为x的列(SArray),而不是movieId

>>> import graphlab as gl
>>> sf = gl.SFrame({"x":["1", "2", "3"]})
>>> sf
Columns:
    x   str

Rows: 3

Data:
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
+---+
[3 rows x 1 columns]

>>> sf["x"] = sf["x"].astype(float)
>>> sf
Columns:
    x   float

Rows: 3

Data:
+-----+
|  x  |
+-----+
| 1.0 |
| 2.0 |
| 3.0 |
+-----+
[3 rows x 1 columns]
```