Question

我正在使用Python 2.7清理一些人类分类数据，主要是使用pandas，但使用numpy.isreal()来检查浮点数，因为有些人显然在'background_color'等字段中输入了浮点数。无论如何，我发布了一个示例，说明使用我当前设置的一种颜色的样子，在循环结束时它只是看起来非常Python，{{1} }是blues不区分大小写background_color的所有索引的列表：

'BLUE'

似乎我可以使用map函数来使这更像Pythonic和更漂亮。就像我说的，它按预期运行，但似乎也是...... C或Java用Python编写。提前感谢任何回复。

-Edit：我删除了计数，因为它是来自旧循环的遗物

Answer 1

您可以使用大写

创建一个bew列

imageData['background_color_2'] = map(lambda x: x.upper(), imageData['background_color'].astype(str))

subset = imageData[imageData['background_color_2']=='BLUE']

为计数

len(subset['background_color'])

Answer 2

您可以定义一个lambda函数，该函数返回具有特定字符串值

的行的索引

getRowIndexWithStringColor = lambda df, color: [i for i in range(df.shape[0]) if (not np.isreal(df.loc[i,'background_color'])) and df.loc[i,'background_color'].upper()==color)]
rowIndexWithBlue = getRowIndexWithStringColor(imageData, 'BLUE')

Answer 3

作为一般规则，如果你在大熊猫中循环，你做错了。

应该看起来像这样（虽然未经测试，所以你需要调整它！）：

strings = (~imageData.background_color.apply(np.isreal))
blue = (imageData.background_color.str.upper()=="BLUE")
blueshapes = imageData[strings & blue].index

Answer 4

谢谢大家！我使用了对Steven G的答案的一个小修改，我将所有这些都备份在一个.csv大师中，所以我没有使用其等效字符串覆盖background_color列的任何选项。任何非sring条目无论如何都是无效的，但它们并不是唯一的，所以我会在连接所有颜色的索引之后将它们作为剩余索引发现。每个列表将如下提取：

imageData['background_color']=map(lambda x: x.upper(), imageData['background_color'].astype(str))

blueShapes=imageData[imageData['background_color']=='BLUE'].index

Answer 5

我会把它变成一个函数并返回一个数组。

Google：Python的禅宗

为您提供快速参考python list/dict/set over map/filter。

更好的可读性和更清晰的代码。

def colorShapes(color):
    return [i
             for i in range(imageData.shape[0])
             if not(np.isreal(imageData.loc[i, 'background_color'].upper() == color 
             and imageData.loc[i, 'background_color'].upper() == color]

使这个数据清理循环更加Python

5 个答案: