在Pandas中使用read_csv()
阅读我的csv文件后,我想将一些列数据表转换为float64
以进行进一步处理,因为它们目前表示为object
dtype。在dtype
中尝试属性read_csv
后,我收到了错误消息。以下是描述:
import pandas as pd
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})
以下是错误的完整描述:
ValueError Traceback (most recent call last)
<ipython-input-14-554c18573267> in <module>()
----> 1 file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":"float64"})
2 #file1 = pd.to_numeric(file_)
3 file_.values
4 file_.dtypes
/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
703 skip_blank_lines=skip_blank_lines)
704
--> 705 return _read(filepath_or_buffer, kwds)
706
707 parser_f.__name__ = name
/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)
449
450 try:
--> 451 data = parser.read(nrows)
452 finally:
453 parser.close()
/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1063 raise ValueError('skipfooter not supported for iteration')
1064
-> 1065 ret = self._engine.read(nrows)
1066
1067 if self.options.get('as_recarray'):
/home/rahul/anaconda/lib/python2.7/site-packages/pandas/io/parsers.pyc in read(self, nrows)
1826 def read(self, nrows=None):
1827 try:
-> 1828 data = self._reader.read(nrows)
1829 except StopIteration:
1830 if self._first_chunk:
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.read()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_low_memory()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._read_rows()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_column_data()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
ValueError: invalid literal for float(): 11,535,309,570.00
如何将具有数字数据的列的dtype转换为float64?
如果我只读取csv,并检查列的dtype,
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv")
file_.dtypes
我明白了:
Year int64
City object
Return office city center float64
Average return logistics float64
Inverse return houses float64
DAX object
MFI Interest Rate Germany float64
Inflation Rate float64
GDP (EUR) object
Size of City (km square) object
Total Population (Number) object
Population under 15 (Number) object
Population 15 to under 65 (Number) object
Population above 65 (Number) object
Total private households (Number) object
1 Person households (Number) object
2 Person households (Number) object
3 Person households (Number) object
4 Person households (Number) object
5 and more person households (Number) object
Total unemployment rate (Rate) float64
Total employment (Number) object
Available income per inhabitant (Eur) object
Total residential building (Number) object
Total Apartments (Number) object
Total new residential building approvals (Number) object
Total new residential building completions (Number) object
Total Migration object
Returns float64
Class float64
dtype: object
基本上,我想将列float64
,DAX
的dtype(GDP
)转换为5 or more person households (Nuumber)
,将Total employment (Number)
转换为Total Migration
感谢。
答案 0 :(得分:2)
因此,如果我想将数据帧df的列c1和c3的数据类型更改为float64,请执行以下操作:
import pandas as pd
import numpy as np
df = pd.DataFrame([["1.2","dan","3"],["1.9","joe","5"]], columns = ["c1","c2","c3"])
print(df)
# c1 c2 c3
# 0 1.2 dan 3
# 1 1.9 joe 5
print(df.dtypes)
#c1 object
#c2 object
#c3 object
dtype: object
df[['c1','c3']] = df[['c1','c3']].astype(np.float64)
print(df)
# c1 c2 c3
# 0 1.2 dan 3.0
# 1 1.9 joe 5.0
print(df.dtypes)
# c1 float64
# c2 object
# c3 float64
# dtype: object
答案 1 :(得分:1)
相信您错过了在从CSV读取数据时不需要在指定数据类型时加注引号,例如
import pandas as pd
import numpy as np
file_ = pd.read_csv("/home/rahul/yearly_data_no_ecb.csv", dtype = {"DAX":np.float64})