这是我的数据框,总共约5000个条目:
data.head(3)
filename date var1 var2 age sex
0 file1.jpg 2012-01-17 132.32 199.17 31.0 2.0
1 file2.jpg 2012-01-17 134.88 196.50 31.0 2.0
2 file3.jpg 2012-01-17 151.19 209.07 31.0 2.0
3 ...
我想根据var1
将此数据集划分为10个分位数组。没问题:
data['var1_groups'] = pd.qcut(data['var1'], 10)
data.head(3)
filename date var1 var2 age sex var1_groups
0 file1.jpg 2012-01-17 132.32 199.17 31.0 2.0 (129.488, 133.659]
1 file2.jpg 2012-01-17 134.88 196.50 31.0 2.0 (133.659, 138.176]
2 file3.jpg 2012-01-17 151.19 209.07 31.0 2.0 (148.196, 153.09]
3 ...
现在,在var1
组中,我希望在age
分位数组中进一步细分。
所以我试试这个:
data['age_groups'] = data.groupby(['var1_groups'])['age'].transform(lambda x: pd.qcut(x, 3))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-164-998f598868ed> in <module>()
----> 1 data['age_groups'] = data.groupby(['var1_groups'])['age'].transform(lambda x: pd.qcut(x, 3))
/usr/lib/miniconda3/envs/python3.5/lib/python3.5/site-packages/pandas/core/groupby.py in transform(self, func, *args, **kwargs)
2762
2763 indexer = self._get_index(name)
-> 2764 result[indexer] = res
2765
2766 result = _possibly_downcast_to_dtype(result, dtype)
ValueError: could not convert string to float: '(53, 69]'
这里发生了什么? Pandas是否尝试将结果类别转换回原始列的dtype或其他内容?