在Python中,什么是一种清晰,有效的方法来计算区域内的东西?

时间:2015-09-30 14:35:19

标签: python count range histogram binning

我正在循环调用名为events的对象。每个事件都有一个特定的对象。我正在计算具有特定特征的对象的分数。想象一下这种方法如下:

for event in events:
    countCars =+ 1
    if event.car.isBlue() is True:
        countCarsBlue =+ 1

print("fraction of cars that are blue: {fraction}".format(
    fraction = countCarsBlue / countCars))

现在,想象一下,我想计算在另一个对象特征的区域中具有特定特征的对象的分数。因此,在我的例子中,我计算的是蓝色汽车的比例。现在,我想计算从0米到1米的汽车长度范围内蓝色汽车的比例,汽车长度从1米到2米,从2米到2米的蓝色汽车比例3米,3米至4米,依此类推。

鉴于我正在处理大量的统计数据以及比我的简单示例中的4个二进制数据库更多的二进制文件,假设一个恒定的二进制宽度,那么为这种类型的计算构造代码的好方法是什么?

变量 bin宽度是否有合理的方法?)

2 个答案:

答案 0 :(得分:3)

如果您正在处理Python 3.4+,那么枚举实际上对此非常有用。以下是您可以做的几个例子:

import random
from enum import Enum
from collections import namedtuple


class Color(Enum):
    blue = 'blue'
    red = 'red'
    green = 'green'

Car = namedtuple('Car', ('color', 'length'))

cars = [Car(Color.blue, 10),
        Car(Color.blue, 3),
        Car(Color.blue, 9),
        Car(Color.red, 9),
        Car(Color.red, 7),
        Car(Color.red, 8),
        Car(Color.green, 3),
        Car(Color.green, 7),
        Car(Color.green, 2),
        Car(Color.green, 8),
        ]

print('# of blue cars:', sum(1 for car in cars if car.color == Color.blue))
print('# of cars with length between 3 and 7:',
      sum(1 for car in cars if 3 <= car.length <= 7))

random_color = random.choice(tuple(Color))
lower_limit = random.randint(1,10)
upper_limit = random.randint(lower_limit,10)
print('# of {} cars with length {} to {} (inclusive):'.format(random_color.name,
                                                              lower_limit,
                                                              upper_limit),
      sum(1 for car in cars if car.color == random_color
                            and lower_limit <= car.length <= upper_limit))

important_colors = (Color.blue, Color.green)
important_lengths = (1,2,3,5,7)

print('Number of cars that match some contrived criteria:',
      sum(1 for car in cars if car.color in important_colors
                            and car.length in important_lengths))

如果你在谈论连续范围,lower < value < upper是一个很好的检查方法。如果您有离散值(如颜色),则可以创建有趣颜色的集合并检查该集合中的成员资格。另请注意,您可以轻松使用变量箱尺寸。

如果您对简单计数感兴趣,也可以使用itertools.groupby。请注意,如果您的项目是引用对象,则更改某个集合中的内容将在另一个集合中更改它:

In [15]: class Simple:
   ....:     def __init__(self, name):
   ....:         self.name = name
   ....:     def __repr__(self):
   ....:         return 'Simple(name={!r})'.format(self.name)
   ....:

In [16]: values = [Simple('one'), Simple('two'), Simple('three')]

In [17]: one = (values[0], values[-1])

In [18]: two = tuple(values[:2])

In [19]: one
Out[19]: (Simple(name='one'), Simple(name='three'))

In [20]: two
Out[20]: (Simple(name='one'), Simple(name='two'))

In [21]: one[0].name = '**changed**'

In [22]: one
Out[22]: (Simple(name='**changed**'), Simple(name='three'))

In [23]: two
Out[23]: (Simple(name='**changed**'), Simple(name='two'))

答案 1 :(得分:0)

首先,重新创建示例的一些代码:

import random

class Event(object):
    def __init__(self):
        self.car = None

class Car(object):
    def __init__(self, isBlue, length):
        self._isBlue = isBlue
        self._length = length

    def isBlue(self):
        return self._isBlue

    def length(self):
        return self._length

    def __str__(self):
        return '{} car of {} m long.'.format('blue' if self.isBlue() else 'non-blue ', self.length())

好的,现在我随机创建了十个car个对象并将其添加到event

totalNumberOfCars = 10
events = []
for _ in range(totalNumberOfCars):
    car = Car(random.choice([True, False]), random.randrange(5, 40)/10.)
    print car
    event = Event()
    event.car = car
    events.append(event)

对我来说,输出如下(你的输出当然可以不同):

blue car of 0.5 m long.
non-blue  car of 2.3 m long.
non-blue  car of 3.8 m long.
blue car of 2.1 m long.
non-blue  car of 0.6 m long.
blue car of 0.8 m long.
blue car of 0.5 m long.
blue car of 2.3 m long.
blue car of 3.3 m long.
blue car of 2.1 m long.

现在,如果我们想按区域计算我们的事件,您可以按如下方式进行:

allBlueCars = sum(1 for event in events if event.car.isBlue())
print "Number of blue cars: {}".format(allBlueCars)

maxCarLen = 4
for region in zip(range(maxCarLen ), range(1, maxCarLen +1)):
    minlen, maxlen = region
    print "Cars between {} and {} m that are blue:".format(minlen, maxlen)
    blueCarsInRegion = [str(event.car) for event in events if event.car.isBlue() and minlen <= event.car.length() < maxlen]
    if blueCarsInRegion:
        print '\n'.join(['\t{}'.format(car) for car in blueCarsInRegion])
    else:
        print 'no blue cars in this region'
    fraction = float(len(blueCarsInRegion)) / allBlueCars
    print "fraction of cars that are blue and between {} and {} m long: {}".format(minlen, maxlen, fraction)
    print

对于上面的示例数据,将打印:

Number of blue cars: 7
Cars between 0 and 1 m that are blue:
    blue car of 0.5 m long.
    blue car of 0.8 m long.
    blue car of 0.5 m long.
fraction of cars that are blue and between 0 and 1 m long: 0.428571428571

Cars between 1 and 2 m that are blue:
no blue cars in this region
fraction of cars that are blue and between 1 and 2 m long: 0.0

Cars between 2 and 3 m that are blue:
    blue car of 2.1 m long.
    blue car of 2.3 m long.
    blue car of 2.1 m long.
fraction of cars that are blue and between 2 and 3 m long: 0.428571428571

Cars between 3 and 4 m that are blue:
    blue car of 3.3 m long.
fraction of cars that are blue and between 3 and 4 m long: 0.142857142857