这里的问题是我需要一个函数来计算喜欢某些控制台的人数,根据他们的公民身份,性别和他们的年龄列表(思考范围()可以工作)。已经使用pandas制作了代码,但不知怎的,我找不到任何可以帮助我的命令。
CustomerNo Name LastName Age CivilState Gender FavouriteConsole
0 1 Joe Smith 48 M M W
1 2 Jonathan Cage 20 S M X
2 3 Lucy Chang 26 S F P
99 100 Alexander Levine 41 M M X
对于收藏控制台,W代表wii,P代表Ps3,x代表xbox。
我需要的是这样的......
For Wii there are x people, which x2 is male and x3 female, x3 are married and x4 are single, the range onf ages is [x5 to x6]
每个控制台......
答案 0 :(得分:1)
假设您的数据位于名为df的DataFrame中:
dfW = df[(df.FavouriteConsole == 'W')] # select Wii lovers
male = (dfW.Gender == 'M').sum() # count males
female = (dfW.Gender == 'F').sum() # count female
min_age = dfW.Age.min() # minimum age
etc...
编辑:只想跟进如何在DataFrame中汇总这些数据,然后访问您想要的任何内容:
cons = []
g = df.groupby('FavouriteConsole')
for gp in g:
cons.append([gp[0], # Console Type
gp[1].shape[0], # Count = number rows
(gp[1].Gender == 'M').sum(), # Males
(gp[1].Gender == 'F').sum(), # Females
(gp[1].CivilState == 'M').sum(), # Married
(gp[1].CivilState == 'S').sum(), # Single
gp[1].Age.min(), # Min Age
gp[1].Age.max()]) # Max Age
summary = pd.DataFrame(cons,
columns=['Console','Count','Male','Female', 'Married',
'Single', 'Min_Age','Max_Age'])
summary.set_index('Console', inplace=True)
print(summary)
Count Male Female Married Single Min_Age Max_Age
Console
P 1 0 1 0 1 26 26
W 1 1 0 1 0 48 48
X 2 2 0 1 1 20 41
您可以访问任何特定的特征:
In [20]: summary.loc['X','Male']
Out[20]: 2
In [21]: summary.loc['P','Single']
Out[21]: 1
答案 1 :(得分:0)
以下是一般功能,与熊猫无关。
假设您在某个文件中有数据说a.txt
。
下面的代码扫描文件并打印输出,如上所述,
#!/usr/bin/python
import sys
import os
def calc(console_name):
f = open("a.txt")
lines = f.readlines()[1:]
# skipped first line
num_people = 0
num_males = 0
num_females = 0
num_married = 0
num_unmarried = 0
min_age = 0
max_age = 0
for line in lines:
fields = line.split()
if (fields[-1] == console_name):
num_people += 1
# Age
age = int(fields[4])
if min_age == 0:
min_age = age
else:
if age < min_age:
min_age = age
if max_age == 0:
max_age = age
else:
if age >= max_age:
max_age = age
# civil status
if fields[5] == 'M':
num_married += 1
elif fields[5] == 'S':
num_unmarried += 1
# Gender
if fields[6] == 'M':
num_males += 1
else:
num_females += 1
print "For %s there are %s people, which %s are male and %s female, \
%s are married and %s are unmarried, age range [ %s - %s]" % (console_name, num_people, num_males, num_females, num_married, num_unmarried, min_age, max_age)
c = 'W'
calc(c)
calc
是执行实际工作的功能。我已经为W
字段测试了它。
Elite-MT-PC:~/Documents/programs$ python a.py
For W there are 1 people, which 1 are male and 0 female, 1 are married and 0 are unmarried, age range [ 48 - 48]
答案 2 :(得分:0)
年龄范围
g = df.groupby('FavouriteConsole')
所有游戏机的计数
counts = g.size().loc
男性女性
gender_group = df.groupby(['FavouriteConsole', 'Gender']).size().loc
民事国家
civil_group = df.groupby(['FavouriteConsole', 'CivilState']).size().loc
和最后的字符串
fmt = "For {} there are {} people, which {} is male and {} female, {} are married and {} are single, the range onf ages is [{} to {}]"
for console in ['P', 'X', 'W']:
fmt.format(console,
counts[console],
gender_group[console].loc['M'],
gender_group[console].loc['F'],
civil_group[console].loc['M'],
civil_group[console].loc['S'],
g.Age.min(),
g.Age.max())
这会因缺少值而崩溃。例如,如果没有男性wii玩家。