Question

I'm tearing my hair out a bit with this one. I've imported two csv's into pandas dataframes both have a column called SiteReference i want to use pd.merge to join dataframes using SiteReference as a key.

Initial merged failed as pd.read took different interpretations of the SiteReference values, in one instance 380500145.0 in the other 380500145 both stored as objects. I ran Regex to clean the columns and then pd.to_numeric, this resulted in one value of 380500145.0 and another of 3.805001e+10. They should both be 380500145. I then attempted;

df['SiteReference'] = df['SiteReference'].astype(int).astype('str')

But got back;

ValueError: cannot convert float NaN to integer

How can i control how pandas is dealing with these, preferably on import?

Answer 1

Following the discussion in the comments, if you want to format floats as integer strings, you can use this:

df['SiteReference'] = df['SiteReference'].map('{:,.0f}'.format)

This should handle null values gracefully.

Answer 2

Perharps最好的解决方案是避免pd.read影响此字段的类型：

df=pd.read_csv('data.csv',sep=',',dtype={'SiteReference':str})

In pandas dataframe handling object data type

2 个答案: