`pandas.read_csv`

Question

我有一个数据文件，当我在其上运行df.dtypes时，它表示该类型是一个对象，但为了绘制它，它需要是integer，float， double等。数据实际上由三列数字组成，如下所示：

123 12345 0.9484 
123 12345 0.8746 
123 12345 0.4838 
123 12345 0.4837

如何转换数据以便可以使用pandas进行绘制？

Answer 1

我认为您可以使用read_csv参数：sep='\s+'（任意空格）和engine='python'（因为警告）：

import pandas as pd
import io

temp=u"""Day Time Usage
123 12345 0.9484
123 12345 0.8746
123 12345 0.4838
123 12345 0.4837"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), sep='\s+', engine='python')

print df
   Day   Time   Usage
0  123  12345  0.9484
1  123  12345  0.8746
2  123  12345  0.4838
3  123  12345  0.4837

print df.dtypes
Day        int64
Time       int64
Usage    float64
dtype: object

print df.index
Int64Index([0, 1, 2, 3], dtype='int64')

Answer 2

`pandas.read_csv`

中的数据类型转换

此答案假设您拥有matplotlib库，您正在iPython进行计算，并且您的文件是%matplotlib inline import pandas as pd from cStringIO import StringIO fake_csv = '''123 12345 0.9484 123 12345 0.8746 123 12345 0.4838 123 12345 0.4837''' # Create data frame, fake csv for example only # normally you would use pd.read_csv('/path/to/file.csv', ...) df = pd.read_csv(StringIO(fake_csv), sep='\s+', header=None) print 'DataFrame:\n{}\n\nData Types:\n{}'.format(df, df.dtypes) df.plot() - ish。

DataFrame:
     0      1       2
0  123  12345  0.9484
1  123  12345  0.8746
2  123  12345  0.4838
3  123  12345  0.4837

Data Types:
0      int64
1      int64
2    float64
dtype: object

dtype:object

_{注意：最后Series是df.dtypes中df的数据类型，而不是1
2
中类型的数据类型。}

如何在Python pandas中将dtype从object转换为int？

2 个答案:

`pandas.read_csv`