我正在按类型对值进行分组,如下所示:
groups = frame.columns.to_series().groupby(frame.dtypes).groups
我收到错误消息:
TypeError: data type not understood
按数据类型对列进行分组以防止此类错误的正确方法是什么?
编辑: 样本输入
0 0 0 1985 ATL NL barkele01 870000 428.0 428.0 1955.0 ... Leonard Harold 225.0 77.0 R R 1976-09-14 1987-09-26 barkl001 barkele01 both
1 1 1 1985 ATL NL bedrost01 550000 559.0 559.0 1957.0 ... Stephen Wayne 200.0 75.0 R R 1981-08-14 1995-08-09 bedrs001 bedrost01 both
2 2 2 1985 ATL NL benedbr01 545000 614.0 614.0 1955.0 ... Bruce Edwin 175.0 73.0 R R 1978-08-18 1989-09-11 beneb001 benedbr01 both
3 3 3 1985 ATL NL campri01 633333 NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN left_only
4 4 4 1985 ATL NL ceronri01 625000 1466.0 1466.0 1954.0 ... Richard Aldo 192.0 71.0 R R 1975-08-17 1992-07-10 ceror001 ceronri01 both
5 5 5 1985 ATL NL chambch01 800000 1481.0 1481.0 1948.0 ... Carroll Christopher 195.0 73.0 L R 1971-05-28 1988-05-08 chamc001 chambch01 both
示例输出将像
{float: [columns], int:[columns], string:[columns]}
答案 0 :(得分:1)
您可以在axis=1
中使用groupby
:
type_dct = {str(k): list(v) for k, v in df.groupby(df.dtypes, axis=1)}
对于您的示例数据框,它给出:
{'int64': [0, 1, 2, 3, 7],
'float64': [8, 9, 10, 14, 15],
'object': [4, 5, 6, 11, 12, 13, 16, 17, 18, 19, 20, 21, 22]}
请注意,没有string
系列与熊猫dtype
这样的东西。 object
dtype表示指向任意Python对象(包括字符串)的指针。有关更多详细信息,请参见Strings in a DataFrame, but dtype is object。