Question

在我的数据集上，我有一列如下：

hist = ['A','FAT',nan,'TAH']

然后，我应该使用循环来获取包含 'A'的单元格。这是我的代码：

    import numpy as np
    import pandas as pd
    import math
    from numpy import nan

    for rowId in np.arange(dt.shape[0]):
        for hist in np.arange(10):
            if math.isnan(dt.iloc[rowId,hist])!=True:
                if 'A' in dt.iloc[rowId,hist]:
                    print("A found in: "+str(dt.iloc[rowId,hist]))

在if 'A' in dt.iloc[rowId,hist]行中当dt.iloc[rowId,hist]的值为NAN时，就会抱怨TypeError: argument of type 'float' is not iterable

所以我决定添加math.isnan(dt.iloc[rowId,hist])!=True: 但是，这也导致以下错误：

TypeError: must be real number, not str

如何查找包含“ A”的值？

Answer 1

您可以仅在列上使用.str.contains [pandas-doc]，而不是对此进行迭代，例如：

>>> df
     0
0    A
1  FAT
2  NaN
3  TAH
>>> df[0].str.contains('A')
0    True
1    True
2     NaN
3    True
Name: 0, dtype: object

例如，您可以然后过滤或获取索引：

>>> df[df[0].str.contains('A') == True]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A') == True]
Int64Index([0, 1, 3], dtype='int64')

或者我们可以使用.notna代替== True：

>>> df[df[0].str.contains('A').notna()]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A').notna()]
Int64Index([0, 1, 3], dtype='int64')

或在.contains()中过滤，例如@Erfan says：

>>> df[df[0].str.contains('A', na=False)]
     0
0    A
1  FAT
3  TAH
>>> df.index[df[0].str.contains('A', na=False)]
Int64Index([0, 1, 3], dtype='int64')

因此，您可以使用以下命令打印值：

for val in df[df[0].str.contains('A') == True][0]:
    print('A found in {}'.format(val))

这给了我们

>>> for val in df[df[0].str.contains('A') == True][0]:
...     print('A found in {}'.format(val))
... 
A found in A
A found in FAT
A found in TAH

如何在循环中跳过NaN值？

1 个答案: