计数器根据其他数据和范围

时间:2015-12-02 06:47:57

标签: python pandas matrix

这里的问题是我需要一个函数来计算喜欢某些控制台的人数,根据他们的公民身份,性别和他们的年龄列表(思考范围()可以工作)。已经使用pandas制作了代码,但不知怎的,我找不到任何可以帮助我的命令。

        CustomerNo  Name LastName  Age CivilState Gender FavouriteConsole
0           1        Joe     Smith   48          M      M                W
1           2   Jonathan      Cage   20          S      M                X
2           3       Lucy     Chang   26          S      F                P
99         100  Alexander   Levine   41          M      M                X

对于收藏控制台,W代表wii,P代表Ps3,x代表xbox。

我需要的是这样的......

For Wii there are x people, which x2 is male and x3 female, x3 are married and x4 are single, the range onf ages is [x5 to x6]

每个控制台......

3 个答案:

答案 0 :(得分:1)

假设您的数据位于名为df的DataFrame中:

dfW = df[(df.FavouriteConsole == 'W')]   # select Wii lovers
male = (dfW.Gender == 'M').sum()         # count males
female = (dfW.Gender == 'F').sum()       # count female
min_age = dfW.Age.min()                  # minimum age
etc...

编辑:只想跟进如何在DataFrame中汇总这些数据,然后访问您想要的任何内容:

cons = []
g = df.groupby('FavouriteConsole')
for gp in g:
    cons.append([gp[0],                             # Console Type
                 gp[1].shape[0],                    # Count = number rows
                (gp[1].Gender == 'M').sum(),        # Males
                (gp[1].Gender == 'F').sum(),        # Females
                (gp[1].CivilState == 'M').sum(),    # Married
                (gp[1].CivilState == 'S').sum(),    # Single
                 gp[1].Age.min(),                   # Min Age
                 gp[1].Age.max()])                  # Max Age
summary = pd.DataFrame(cons, 
            columns=['Console','Count','Male','Female', 'Married', 
                        'Single', 'Min_Age','Max_Age'])
summary.set_index('Console', inplace=True)
print(summary)

         Count  Male  Female  Married  Single  Min_Age  Max_Age
Console                                                        
P            1     0       1        0       1       26       26
W            1     1       0        1       0       48       48
X            2     2       0        1       1       20       41

您可以访问任何特定的特征:

In [20]: summary.loc['X','Male']
Out[20]: 2

In [21]: summary.loc['P','Single']
Out[21]: 1

答案 1 :(得分:0)

以下是一般功能,与熊猫无关。

假设您在某个文件中有数据说a.txt。 下面的代码扫描文件并打印输出,如上所述,

#!/usr/bin/python

import sys
import os

def calc(console_name):
        f = open("a.txt")

        lines = f.readlines()[1:]
        # skipped first line

        num_people = 0
        num_males = 0
        num_females = 0
        num_married = 0
        num_unmarried = 0
        min_age = 0
        max_age = 0


        for line in lines:
                fields = line.split()

                if (fields[-1] == console_name):
                        num_people += 1

                        # Age
                        age = int(fields[4])
                        if min_age == 0:
                                min_age = age
                        else:
                                if age < min_age:
                                        min_age = age

                        if max_age == 0:
                                max_age = age
                        else:
                                if age >= max_age:
                                        max_age = age

                        # civil status
                        if fields[5] == 'M':
                                num_married += 1
                        elif fields[5] == 'S':
                                num_unmarried += 1

                        # Gender
                        if fields[6] == 'M':
                                num_males += 1
                        else:
                                num_females += 1


        print "For %s there are %s people, which %s are male and %s female, \
                %s are married and %s are unmarried, age range [ %s - %s]" % (console_name, num_people, num_males, num_females, num_married, num_unmarried, min_age, max_age)

c = 'W'
calc(c)

calc是执行实际工作的功能。我已经为W字段测试了它。

   Elite-MT-PC:~/Documents/programs$ python a.py 
    For W there are 1 people, which 1 are male and 0 female, 1 are married and 0 are unmarried, age range [ 48 - 48]

答案 2 :(得分:0)

年龄范围

 g = df.groupby('FavouriteConsole')

所有游戏机的计数

  counts   = g.size().loc

男性女性

gender_group = df.groupby(['FavouriteConsole', 'Gender']).size().loc

民事国家

civil_group = df.groupby(['FavouriteConsole', 'CivilState']).size().loc

和最后的字符串

fmt = "For {} there are {} people, which {} is male and {} female, {} are married and {} are single, the range onf ages is [{} to {}]"

for console in ['P', 'X', 'W']:
    fmt.format(console, 
               counts[console], 
               gender_group[console].loc['M'],
               gender_group[console].loc['F'],
               civil_group[console].loc['M'],
               civil_group[console].loc['S'],
               g.Age.min(),
               g.Age.max())

这会因缺少值而崩溃。例如,如果没有男性wii玩家。