Question

df = pd.DataFrame( data = {"CouponCode": ["1","2","3","Winter14","5"] } )

I want to take the above data frame and get the below dataframe. I want all the numbers in the CouponCode column to be converted to floats or ints and if the value is not a number I want to put it in a new column name couponName (I put in "nan" as strings, but of course I want them to be 'real' nulls).

df_new = pd.DataFrame(data = {"CouponCode": [1,2,3,"Winter14",5], "CouponName": ["nan", "nan", "nan", "Winter14", "nan"]} )

Answer 1

plan

use pd.to_numeric applied to every individual element of 'CouponCode'. This is slow! Why do I do it? Because it's the only way I know of to ensure that int gets mapped to int, float to float, and str to str. dtype of resultant column will be object. I could have done a pd.to_numeric on the entire column with errors='coerce', the problem is that the nan would convert int to float.
use .str.is_numeric to determine which are strings, and port those over to the new column.

df.loc[~df.CouponCode.astype(str).str.isnumeric(), 'CouponName'] = df.CouponCode
df.loc[:, 'CouponCode'] = df.CouponCode.apply(pd.to_numeric, errors='ignore')
df

  CouponCode CouponName
0          1        NaN
1          2        NaN
2          3        NaN
3   Winter14   Winter14
4          5        NaN

df.to_dict('list')

{'CouponCode': [1, 2, 3, 'Winter14', 5],
 'CouponName': [nan, nan, nan, 'Winter14', nan]}

Answer 2

Use .str accessor with isnumeric():

df['CouponName']=np.where(~df.CouponCode.str.isnumeric(),df.CouponCode,np.nan)

Output:

  CouponCode CouponName
0          1        NaN
1          2        NaN
2          3        NaN
3   Winter14   Winter14
4          5        NaN

If string is a number convert to float otherwise put in new column

2 个答案: