更改特定数据框列的数据类型 - Pandas

时间:2017-12-12 04:41:51

标签: python python-2.7 pandas csv

在Pandas中使用read_csv()阅读我的csv文件后,我想将一些列数据表转换为float64以进行进一步处理,因为它们目前表示为object dtype。在dtype中尝试属性read_csv后,我收到了错误消息。以下是描述:

import pandas as pd
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})

以下是错误的完整描述:

ValueError                                Traceback (most recent call last)
<ipython-input-14-554c18573267> in <module>()
----> 1 file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})
  2 #file1 = pd.to_numeric(file_)
  3 file_.values
  4 file_.dtypes

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
703                     skip_blank_lines=skip_blank_lines)
704 
--> 705         return _read(filepath_or_buffer, kwds)
706 
707     parser_f.__name__ = name

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
449 
450     try:
--> 451         data = parser.read(nrows)
452     finally:
453         parser.close()

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1063                 raise ValueError('skipfooter not supported for iteration')
1064 
-> 1065         ret = self._engine.read(nrows)
1066 
1067         if self.options.get('as_recarray'):

/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1826     def read(self, nrows=None):
1827         try:
-> 1828             data = self._reader.read(nrows)
1829         except StopIteration:
1830             if self._first_chunk:

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()

pandas/_libs/parsers.pyx in     pandas._libs.parsers.TextReader._convert_column_data()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()

ValueError: invalid literal for float(): 11,535,309,570.00

如何将具有数字数据的列的dtype转换为float64?

如果我只读取csv,并检查列的dtype,

file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv")
file_.dtypes

我明白了:

Year                                                     int64
City                                                    object
Return office city center                              float64
Average return logistics                               float64
Inverse return houses                                  float64
DAX                                                     object
MFI Interest Rate Germany                              float64
Inflation Rate                                         float64
GDP (EUR)                                               object
Size of City (km square)                                object
Total Population (Number)                               object
Population under 15 (Number)                            object
Population 15 to under 65 (Number)                      object
Population above 65 (Number)                            object
Total private households (Number)                       object
1 Person households (Number)                            object
2 Person households (Number)                            object
3 Person households (Number)                            object
4 Person households (Number)                            object
5 and more person households (Number)                   object
Total unemployment rate (Rate)                         float64
Total employment (Number)                               object
Available income per inhabitant (Eur)                   object
Total residential building (Number)                     object
Total Apartments (Number)                               object
Total new residential building approvals (Number)       object
Total new residential building completions (Number)     object
Total Migration                                         object
Returns                                                float64
Class                                                  float64
dtype: object

基本上,我想将列float64DAX的dtype(GDP)转换为5 or more person households (Nuumber),将Total employment (Number)转换为Total Migration

感谢。

2 个答案:

答案 0 :(得分:2)

因此,如果我想将数据帧df的列c1和c3的数据类型更改为float64,请执行以下操作:

import pandas as pd
import numpy as np
df = pd.DataFrame([["1.2","dan","3"],["1.9","joe","5"]], columns = ["c1","c2","c3"])
print(df)
#    c1   c2 c3
# 0  1.2  dan  3
# 1  1.9  joe  5

print(df.dtypes)

#c1    object
#c2    object
#c3    object
dtype: object

df[['c1','c3']] = df[['c1','c3']].astype(np.float64)
print(df)
#    c1   c2   c3
# 0  1.2  dan  3.0
# 1  1.9  joe  5.0

print(df.dtypes)

# c1    float64
# c2     object
# c3    float64
# dtype: object

答案 1 :(得分:1)

相信您错过了在从CSV读取数据时不需要在指定数据类型时加注引号,例如

import pandas as pd
import numpy as np
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":np.float64})