Question

数据是两列，City，我需要根据总和按城市分组。

表看起来像这样（一百万次）：

City, People
Boston, 1000
Boston, 2000
New York, 2500
Chicago, 2000

在这种情况下，波士顿将成为拥有3000人的第一名。我需要返回前5％的城市和他们的人数（总和）。

最有效的方法是什么？大熊猫可以很好地扩大规模吗？我应该跟踪前5％还是最后进行排序？

Answer 1

如果您希望在没有外部库的情况下使用Python，则可以执行以下操作。首先，我用csv打开文件。然后我们可以使用内置的sorted函数在自定义键上对数组进行排序（基本上，检查第二个元素）。然后我们使用[]抓住我们想要的部分。

import csv, math

out = []
with open("data.csv","r") as fi:
    inCsv = csv.reader(fi,delimiter=',')
    for row in inCsv:
        out.append([col.strip() for col in row])
print (sorted(out[1:], key=lambda a: a[1], reverse=True)[:int(math.ceil(len(out)*.05))])

Answer 2

rank获得金额
df = pd.read_csv(skipinitialspace=True) d1 = df.groupby('City').People.sum() d1.loc[d1.rank(pct=True) >= .95] City Boston 3000 Name: People, dtype: int64获得perctiles

    NSURL *imgPath = [[NSBundle mainBundle] URLForResource:@"sound" withExtension:@"mp3"];

    NSError *error;

    self.player = [[AVAudioPlayer alloc] initWithContentsOfURL:imgPath
                                                         error:&error];
    self.player.numberOfLoops = 0; //Infinite
    self.player.delegate  = self;
    [self.player prepareToPlay];
    [self.player play];

最有效的分组，计数，然后排序的方法？

2 个答案: