import pandas as pd
path1 = "/home/supertramp/Desktop/100&life_180_data.csv"
mydf = pd.read_csv(path1)
numcigar = {"Never":0 ,"1-5 Cigarettes/day" :1,"10-20 Cigarettes/day":4}
print mydf['Cigarettes']
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
print mydf['CigarNum']
mydf.to_csv('/home/supertramp/Desktop/powerRangers.csv')
csv文件“100& life_180_data.csv”包含age,bmi,Cigarettes,Alocohol等列。
No int64
Age int64
BMI float64
Alcohol object
Cigarettes object
dtype: object
香烟栏包含“从不”“1-5根香烟/天”,“10-20根香烟/天”。 我想为这些物体分配重量(从不,1-5根香烟/天,......)
预期输出是附加的新列CigarNum,其仅包含数字0,1,2 预期CigarNum直到8行,然后显示Nan到CigarNum列的最后一行
0 Never
1 Never
2 1-5 Cigarettes/day
3 Never
4 Never
5 Never
6 Never
7 Never
8 Never
9 Never
10 Never
11 Never
12 10-20 Cigarettes/day
13 1-5 Cigarettes/day
14 Never
...
167 Never
168 Never
169 10-20 Cigarettes/day
170 Never
171 Never
172 Never
173 Never
174 Never
175 Never
176 Never
177 Never
178 Never
179 Never
180 Never
181 Never
Name: Cigarettes, Length: 182, dtype: object
我得到的输出几乎没有在第一行后给NaN。
0 0
1 0
2 1
3 0
4 0
5 0
6 0
7 0
8 0
9 0
10 NaN
11 NaN
12 NaN
13 NaN
14 0
...
167 NaN
168 NaN
169 NaN
170 NaN
171 NaN
172 NaN
173 NaN
174 NaN
175 NaN
176 NaN
177 NaN
178 NaN
179 NaN
180 NaN
181 NaN
Name: CigarNum, Length: 182, dtype: float64
答案 0 :(得分:33)
好的,首先问题是你有嵌入空格导致函数错误地应用:
使用向量化str
:
mydf['Cigarettes'] = mydf['Cigarettes'].str.replace(' ', '')
现在创建新列应该正常工作:
mydf['CigarNum'] = mydf['Cigarettes'].apply(numcigar.get).astype(float)
<强>更新强>
感谢@Jeff一如既往地指出了卓越的做事方式:
因此,您可以致电replace
而不是致电apply
:
mydf['CigarNum'] = mydf['Cigarettes'].replace(numcigar)
# now convert the types
mydf['CigarNum'] = mydf['CigarNum'].convert_objects(convert_numeric=True)
您也可以使用factorize
方法。
考虑一下为什么不将dict值设置为浮点数然后避免类型转换?
所以:
numcigar = {"Never":0.0 ,"1-5 Cigarettes/day" :1.0,"10-20 Cigarettes/day":4.0}
版本0.17.0或更新
convert_objects
自0.17.0
后已弃用,已被to_numeric
mydf['CigarNum'] = pd.to_numeric(mydf['CigarNum'], errors='coerce')
此处errors='coerce'
将返回NaN
,其中值无法转换为数值,否则会引发异常
答案 1 :(得分:4)
尝试将此功能用于此类所有问题:
def get_series_ids(x):
'''Function returns a pandas series consisting of ids,
corresponding to objects in input pandas series x
Example:
get_series_ids(pd.Series(['a','a','b','b','c']))
returns Series([0,0,1,1,2], dtype=int)'''
values = np.unique(x)
values2nums = dict(zip(values,range(len(values))))
return x.replace(values2nums)