在graphlab中,我遇到了以下问题:
feat1 = gl.SFrame.read_csv(dir_path + '/data/' + 'file_1.csv')
'feat1'包含一个名为'movieId'的列,类型为'int'。
feat1.dtype
0 float
1 float
2 float
3 float
4 float
5 float
6 float
7 float
8 float
9 float
10 float
11 float
12 float
13 float
14 float
15 float
16 float
17 float
18 float
19 float
20 float
21 float
22 float
23 float
24 float
25 float
26 float
27 float
28 float
29 float
30 float
31 float
32 float
33 float
34 float
35 float
36 float
37 float
38 float
39 float
40 float
41 float
42 float
43 float
44 float
45 float
46 float
47 float
48 float
49 float
50 float
51 float
52 float
53 float
54 float
55 float
56 float
57 float
58 float
59 float
60 float
61 float
62 float
63 float
64 float
65 float
66 float
67 float
68 float
69 float
70 float
71 float
72 float
73 float
74 float
75 float
76 float
77 float
78 float
79 float
80 float
81 float
82 float
83 float
84 float
85 float
86 float
87 float
88 float
89 float
90 float
91 float
92 float
93 float
94 float
95 float
96 float
97 float
98 float
99 float
100 float
101 float
102 float
103 float
104 float
105 float
106 float
107 float
108 float
109 float
110 float
111 float
112 float
113 float
114 float
115 float
116 float
117 float
118 float
119 float
120 float
121 float
122 float
123 float
124 float
125 float
126 float
127 float
128 float
129 float
130 float
131 float
132 float
133 float
134 float
135 float
136 float
137 float
138 float
139 float
140 float
141 float
142 float
143 float
144 float
145 float
146 float
147 float
148 float
149 float
150 float
151 float
152 float
153 float
154 float
155 float
156 float
157 float
158 float
159 float
160 float
161 float
162 float
163 float
164 float
165 float
166 float
167 float
168 float
169 float
170 float
171 float
172 float
173 float
174 float
175 float
176 float
177 float
178 float
179 float
180 float
181 float
182 float
183 float
184 float
185 float
186 float
187 float
188 float
189 float
190 float
191 float
192 float
193 float
194 float
195 float
196 float
197 float
198 float
199 float
movieId int
另一方面,存在一个标题名为'movieId'的SFrame,并输入'str'
movieIds.dtype
<bound method SFrame.dtype of Columns:
movieId str
Rows: 13140
Data:
+---------+
| movieId |
+---------+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
| 6 |
| 7 |
| 8 |
| 9 |
| 10 |
+---------+
[13140 rows x 1 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.>
尝试加入时我收到此错误:
feat1 = movieIds.join(feat1, on='movieId', how='inner')
RuntimeError: Runtime Exception. Columns movieId and movieId do not have the same type in both SFrames.
如何控制导入的'csv'的一列以使用特定数据类型导入?在您看来,克服此问题的最佳方法是什么?非常感谢您的评论。
答案 0 :(得分:1)
您可以首先将sf["movieId"]
SArray的dtype从string更改为float。然后你可以再次尝试加入。请遵循此示例(其中sf有一个名为x
的列(SArray),而不是movieId
。
>>> import graphlab as gl
>>> sf = gl.SFrame({"x":["1", "2", "3"]})
>>> sf
Columns:
x str
Rows: 3
Data:
+---+
| x |
+---+
| 1 |
| 2 |
| 3 |
+---+
[3 rows x 1 columns]
>>> sf["x"] = sf["x"].astype(float)
>>> sf
Columns:
x float
Rows: 3
Data:
+-----+
| x |
+-----+
| 1.0 |
| 2.0 |
| 3.0 |
+-----+
[3 rows x 1 columns]
```