从csv读取字符串值

时间:2018-04-27 16:02:41

标签: python pandas

+-----------+-------+-----------------------+
|V1         |   n   | ip                    |
+-----------+-------+-----------------------+
|02-08-2017 |2      |00.121.187.120:3447    |
|03-08-2017 |5      |01.110.186.182:23      |
|30-07-2017 |13     |08.167.141.192:25      |
|26-07-2017 |19     |1.175.4.214:33274      |
|01-08-2017 |72     |10.174.218.134:59259   |
+-----------+-------+-----------------------+

这是我的csv文件,我正在尝试使用群集技术但是我的专栏" V1"保存为字符串因此我无法读取它。

import pandas
import pylab as pl

from sklearn.cluster import KMeans

from sklearn.decomposition import PCA
import ast


variables = pandas.read_csv('D:\\Date\\date-dfki.csv',dtype=str)

Y =  variables[['V1']]

X = variables[['n']]
Nc = range(1, 20)

kmeans = [KMeans(n_clusters=i) for i in Nc]

k均值

score = [kmeans[i].fit(Y).score(Y) for i in range(len(kmeans))]

得分

pl.plot(Nc,score)

pl.xlabel('Number of Clusters')

pl.ylabel('Score')

pl.title('Elbow Curve')

请有人告诉我如何阅读它,因为我无法将字符串转换为float / int我也无法继续。 这是我得到的错误:

array = np.array(array, dtype=dtype, order=order, copy=copy)
**ValueError: could not convert string to float: '27-07-2017'**

1 个答案:

答案 0 :(得分:-1)

  

在此列出可以做什么的示例。

import pandas as pd
import numpy as np

#create dataset sample
d = {'V1': ["02-08-2017" , "03-08-2017"], 'n': ["2", "5"],'ip': ["104.44.194.237:25", "106.42.34.86:49324 "] }
df = pd.DataFrame(data=d, dtype=np.int8)
df.to_csv('date-dfki.csv', sep=',')


#here from where starts your read file:
parse_dates = ['V1'] #specify the column you need for datetime, because on read pandas automatically read the date as string. 
variables = pd.read_csv('date-dfki.csv', dtype={'V1': str, 'ip': np.str, 'n': np.int32}, parse_dates=parse_dates) #in data type you specify each column what format to use

变量数据集:

example dataset

  

接下来,您需要清除IP地址,以便转换为int或float或您想要的任何其他格式(我使用int):

variables['ip'] = variables['ip'].str.replace('.', '') #removes '.'
variables['ip'] = variables['ip'].str.replace(':', '') #removes ':'
variables['ip'] = variables['ip'].astype(int) #convert to 'int'

结果如下:

result conversion

因此,如果您有多个列,则可以为每个列执行相同的过程,并以您想要的任何格式进行转换。

  

这是浮动转换:

variables['ip'] = variables['ip'].astype(float) #or float conversion

float conversion