ValueError:数组不能包含infs或NaN

时间:2016-03-07 12:15:20

标签: python numpy matplotlib scipy k-means

我有一个csv文件,其数据格式化,例如,如下(我的数据集要大得多):

Image Id,URL,Latitude,Longitude,Address
10758202333,https://farm8.staticflickr.com/7408/10758202333_b6c29d93b1_q.jpg,51.482826,-0.167112,Cadogan Pier Chelsea Embankment Chelsea Royal Borough of Kensington and Chelsea London 
23204019400,https://farm6.staticflickr.com/5688/23204019400_fb6879abe3_q.jpg,51.483106,-3.171207,Greggs Station Terrace Plasnewydd Cardiff Wales CF United Kingdom
11243511074,https://farm3.staticflickr.com/2818/11243511074_e1e2f1b99c_q.jpg,51.483297,-0.166534,Albert Bridge Chelsea Embankment Chelsea Royal Borough of Kensington and Chelsea London Greater London England SW3 5SY United Kingdom
22186903335,https://farm6.staticflickr.com/5697/22186903335_de53168305_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22197179851,https://farm6.staticflickr.com/5786/22197179851_a818b17fae_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22174235522,https://farm1.staticflickr.com/589/22174235522_3ffd1de2bb_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
22160755536,https://farm1.staticflickr.com/761/22160755536_8e23e9ed32_q.jpg,51.483394,-3.176926,Greyfriars House Greyfriars Road Plasnewydd Cardiff Wales CF United Kingdom
7667114130,https://farm8.staticflickr.com/7269/7667114130_117849250a_q.jpg,51.484563,-3.178181,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
17136775881,https://farm9.staticflickr.com/8780/17136775881_363c2379ef_q.jpg,51.484608,-3.178845,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
7110881411,https://farm9.staticflickr.com/8162/7110881411_f0fe3d7214_q.jpg,51.484644,-3.178099,Oybike Gorsedd Gardens Road Cathays Cardiff Wales CF United Kingdom
11718453936,https://farm4.staticflickr.com/3700/11718453936_148af12df6_q.jpg,51.484661,-3.179117,King Edward VII Avenue Cathays Cardiff Wales CF United Kingdom
20218915752,https://farm1.staticflickr.com/352/20218915752_4282c1f9b8_q.jpg,51.484683,-3.179147,King Edward VII Avenue Cathays Cardiff Wales CF United Kingdom

我的代码如下,我知道它并不多,但我只是希望能够看到现在出现质心的群集图。但是我收到错误" ValueError:数组不能包含infs或NaNs"

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from scipy.cluster.vq import kmeans, kmeans2, whiten

df = pd.read_csv('dataset_import.csv')
df.head()

coordinates = df.as_matrix(columns=['latitude', 'longitude'])
N = len(coordinates)
k = 100
i = 50
w = whiten(coordinates)

cluster_centroids, closest_centroids = kmeans2(w, k, iter=i, minit='points')
plt.figure(figsize=(10, 6), dpi=100)
plt.scatter(cluster_centroids[:,0], cluster_centroids[:,1], c='r', alpha=.7, s=150)
plt.scatter(w[:,0], w[:,1], c='k', alpha=.3, s=10)
plt.show()

任何人都可以解释为什么会发生这种情况,或许我的代码中的一些错误是错误的等等。谢谢!

1 个答案:

答案 0 :(得分:0)

我也遇到了同样的问题,我通过清除NaN和infs来解决了。

def clean(serie):
    output = serie[(np.isnan(serie) == False) & (np.isinf(serie) == False)]
    return output

绘制图时,我使用此功能以临时方式清理数据,并且现在可以使用。

fig = plt.figure()
clean(data[col]).plot(kind='kde')
plt.show()

或者这样:

sns.kdeplot(clean(data[col]), bw=0.1, shade=True, legend=False)