Question

我有一个数据框

name    col1
satya    12
satya    abc
satya    109.12
alex     apple
alex     1000

所以现在我需要显示列＆＃39; col1＆＃39;在它中有int值.O / p看起来像

name    col1
satya    12
alex     1000

如果搜索字符串值

name    col1
satya    abc
alex     apple

同样聪明..请提示一些代码行（可能正在使用reg）。

Answer 1

让我们从一个简单的正则表达式开始，如果你有一个整数，则评估为True，否则为False：

import re
regexp = re.compile('^-?[0-9]+$')
bool(regexp.match('1000'))
True
bool(regexp.match('abc'))
False

一旦你有这样的正则表达式，你可以按照以下步骤进行：

mask = df['col1'].map(lambda x: bool(regexp.match(x)) )
df.loc[mask]

    name    col1
0   satya   12
4   alex    1000

要搜索您要执行的字符串：

regexp_str = re.compile('^[a-zA-Z]+$')
mask_str = df['col1'].map(lambda x: bool(regexp_str.match(x)))
df.loc[mask_str]

    name    col1
1   satya   abc
3   alex    apple

修改

如果数据框是由：
创建的，则上述代码将起作用
df = pd.read_clipboard()

（或者，所有变量都以字符串形式提供）。

如果正则表达式方法有效取决于df的创建方式。例如，如果它是用：
创建的
df = pd.DataFrame({'name': ['satya','satya','satya', 'alex', 'alex'], 'col1': [12,'abc',109.12,'apple',1000] }, columns=['name','col1'])

以上代码会因TypeError: expected string or bytes-like object
而失败
要使其在任何情况下都有效，需要明确强制类型为str：

mask = df['col1'].astype('str').map(lambda x: bool(regexp.match(x)) ) df.loc[mask] name col1 0 satya 12 4 alex 1000

和字符串相同：

regexp_str = re.compile('^[a-zA-Z]+$') mask_str = df['col1'].astype('str').map(lambda x: bool(regexp_str.match(x))) df.loc[mask_str] name col1 1 satya abc 3 alex apple

<强> EDIT2

找一个浮点数：

regexp_float = re.compile('^[-\+]?[0-9]*(\.[0-9]+)$') mask_float = df['col1'].astype('str').map(lambda x: bool(regexp_float.match(x))) df.loc[mask_float] name col1 2 satya 109.12

Answer 2

在pandas中你会做这样的事情：

mask = df.col1.apply(lambda x: type(x) == int)
print df[mask]

哪会产生您的预期输出。

Answer 3

您可以检查该值是否仅包含数字：

In [104]: df
Out[104]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

整数：

In [105]: df[~df.col1.str.contains(r'\D')]
Out[105]:
    name  col1
0  satya    12
4   alex  1000

非整数：

In [106]: df[df.col1.str.contains(r'\D')]
Out[106]:
    name    col1
1  satya     abc
2  satya  109.12
3   alex   apple

如果要过滤所有数值（整数/浮点数/小数），可以使用pd.to_numeric(..., errors='coerce')：

In [75]: df
Out[75]:
    name    col1
0  satya      12
1  satya     abc
2  satya  109.12
3   alex   apple
4   alex    1000

In [76]: df[pd.to_numeric(df.col1, errors='coerce').notnull()]
Out[76]:
    name    col1
0  satya      12
2  satya  109.12
4   alex    1000

In [77]: df[pd.to_numeric(df.col1, errors='coerce').isnull()]
Out[77]:
    name   col1
1  satya    abc
3   alex  apple

Answer 4

def is_integer(element):
    try:
        int(element) #if this is str then there will be error
        return 1
    except:
        return 0

您可以简单地定义下面的功能，然后使用for循环列出您的项目。

def list_str(list_of_data):
    str_list=[]
    for item in list_of_data: #list_of_data = [[names],[col1s]] if just col1s replace item[2] with item[1]
        if not is_integer(item[2]):
            str_list.append(item)
    return str_list

def list_int(list_of_data):
    int_list=[]
    for item in list_of_data:
        if is_integer(item[2]):
            int_list.append(item)
    return int_list

希望这可以帮到你

Answer 5

您可以使用df.applymap(np.isreal)

df = pd.DataFrame({'col1': [12,'abc',109.12,'apple',1000], 'name': ['satya','satya','satya', 'alex', 'alex']})
df
col1    name
0   12  satya
1   abc     satya
2   109.12  satya
3   apple   alex
4   1000    alex

df2 = df[df.applymap(np.isreal)]
df2
col1    name
0   12  NaN
1   NaN     NaN
2   109.12  NaN
3   NaN     NaN
4   1000    NaN

df2 = df2[df2.col1.notnull()]
df2
col1    name
0   12  NaN
2   109.12  NaN
4   1000    NaN

index_list = df2.index.tolist()
index_list
[0, 2, 4]

df = df.iloc[index_list]
df
col1    name
0   12  satya
2   109.12  satya
4   1000    alex

如何在Pandas DATAFRAME中查找具有特定数据类型的列值的行

5 个答案: