In pandas dataframe handling object data type

时间:2016-04-25 09:23:16

标签: python csv pandas

I'm tearing my hair out a bit with this one. I've imported two csv's into pandas dataframes both have a column called SiteReference i want to use pd.merge to join dataframes using SiteReference as a key.

Initial merged failed as pd.read took different interpretations of the SiteReference values, in one instance 380500145.0 in the other 380500145 both stored as objects. I ran Regex to clean the columns and then pd.to_numeric, this resulted in one value of 380500145.0 and another of 3.805001e+10. They should both be 380500145. I then attempted;

df['SiteReference'] = df['SiteReference'].astype(int).astype('str')  

But got back;

ValueError: cannot convert float NaN to integer

How can i control how pandas is dealing with these, preferably on import?

2 个答案:

答案 0 :(得分:0)

Following the discussion in the comments, if you want to format floats as integer strings, you can use this:

df['SiteReference'] = df['SiteReference'].map('{:,.0f}'.format)

This should handle null values gracefully.

答案 1 :(得分:0)

Perharps最好的解决方案是避免pd.read影响此字段的类型:

df=pd.read_csv('data.csv',sep=',',dtype={'SiteReference':str})