按dtype分组熊猫数据框

时间:2018-09-28 19:25:44

标签: python pandas dataframe

我有一个熊猫数据框df,其中有一列称为A,其中包含多种数据类型。我想选择df的所有行,其中A具有特定的数据类型。

例如,假设A具有类型intstr。我想做类似df[type(df[A])==int]的事情。

4 个答案:

答案 0 :(得分:4)

设置

df = pd.DataFrame({'A': ['hello', 1, 2, 3, 'bad']})

将为整列分配dtype Object。如果您只想查找数值:

pd.to_numeric(df.A, errors='coerce').dropna() 

1    1.0
2    2.0
3    3.0
Name: A, dtype: float64

然而,这也将允许浮点数,数字的字符串表示等进入混合。如果您确实想查找属于type int的元素,则可以使用列表推导:

df.loc[[isinstance(val, int) for val in df.A], 'A']

1    1
2    2
3    3
Name: A, dtype: object

但是请注意,dtype仍然是Object


如果该列具有布尔值,则将保留这些值,因为boolint的子类。如果您不想要此行为,则可以使用type代替isinstance

答案 1 :(得分:4)

type分组

dod = dict(tuple(df.groupby(df['A'].map(type), sort=False)))

设置

df = pd.DataFrame(dict(A=[1, 'one', {1}, [1], (1,)] * 2))

验证

for t, d in dod.items():
    print(t, d, sep='\n')
    print()

<class 'int'>
   A
0  1
5  1

<class 'str'>
     A
1  one
6  one

<class 'set'>
     A
2  {1}
7  {1}

<class 'list'>
     A
3  [1]
8  [1]

<class 'tuple'>
      A
4  (1,)
9  (1,)

答案 2 :(得分:4)

使用user3483203中的MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t USING ( SELECT u.account_no_old, u.account_no_new, v.contract_id FROM u, v WHERE v.tenant_id = u.tenant_id ) s ON ((SELECT t.account_no FROM dual) = s.account_no_old AND t.contract_id = s.contract_id) WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new; 数据

MERGE INTO (SELECT t.account_no, t.contract_id FROM t) t
USING (SELECT u.account_no_old, u.account_no_new, v.contract_id
       FROM u, v
       WHERE v.tenant_id = u.tenant_id) s
ON((t.account_no,t.contract_id,'x')=((s.account_no_old,s.contract_id,'x')) OR 1=2) 
WHEN MATCHED THEN UPDATE SET t.account_no = s.account_no_new;

答案 3 :(得分:0)

a = [2, 'B',3.0, 'c', 1, 'a', 2.0, 'b',3, 'C', 'A', 1.0]

df = pd.DataFrame({"a": a})

df['upper'] = df['a'].str.isupper()
df['lower'] = df['a'].str.islower()
df['int'] = df['a'].apply(isinstance,args = [int])
df['float'] = df['a'].apply(isinstance,args = [float])

print(df)

    a   upper   lower   int     float
0   2   NaN      NaN    True    False
1   B   True    False   False   False
2   3   NaN      NaN    False   True
3   c   False   True    False   False
4   1   NaN      NaN    True    False
5   a   False   True    False   False
6   2   NaN      NaN    False   True
7   b   False   True    False   False
8   3   NaN      NaN    True    False
9   C   True    False   False   False
10  A   True    False   False   False
11  1   NaN      NaN    False   True

integer = df[df['int']]['a']

print(integer)

0    2
4    1
8    3
Name: a, dtype: object