我有一个名为' salaries.csv'的CSV文件文件内容如下:
市工作,薪酬
德里,医生,500个
德里,律师,400个
德里,管道工,100个
伦敦,医生,800个
伦敦,律师,700个
伦敦,管道工,300个
东京,医生,900个
东京,律师,800个
东京,管道工,400个
律师,医生,300个
律师,律师,400个
律师,管道工,500个
香港,医生,1800
香港,律师,1100
香港,水管工,1000
莫斯科,医生,300个
莫斯科,律师,200个
莫斯科,管道工,100个
柏林,医生,800个
柏林,管道工,900个
巴黎,医生,900个
巴黎,律师,800个
巴黎,管道工,500个
巴黎,狗捕手,400
我需要打印每个职业的中位数薪水。我尝试了一个代码,它显示了一些错误。
我的代码是:
from StringIO import StringIO
import sqlite3
import csv
import operator #from operator import itemgetter, attrgetter
data = open('sal.csv', 'r').read()
string = ''.join(data)
f = StringIO(string)
reader = csv.reader(f)
conn = sqlite3.connect(':memory:')
c = conn.cursor()
c.execute('''create table data (City text, Job text, Salary real)''')
conn.commit()
count = 0
for e in reader:
if count==0:
print ""
else:
e[0]=str(e[0])
e[1]=str(e[1])
e[2] = float(e[2])
c.execute("""insert into data values (?,?,?)""", e)
count=count+1
conn.commit()
labels = []
counts = []
count = 0
c.execute('''select count(Salary),Job from data group by Job''')
for row in c:
for i in row:
if count==0:
counts.append(i)
count=count+1
else:
count=0
labels.append(i)
c.execute('''select Salary,Job from data order by Job''')
count = 1
count1 = 1
temp = 0
pri = 0
lis = []
for row in c:
lis.append(row)
for cons in counts:
if cons%2 == 0:
pri = cons/2
else:
pri = (cons+1)/2
if count1 == 1:
for li in lis:
if count == pri:
print "Median is ",li
count = count + 1
count = 0
temp = pri+cons
else:
for li in lis:
if count == temp:
print "Median is",li
count = count+1
count = 0
temp = temp + pri
count1 = count1 + 1
然而,它显示出一些错误:
IndentationError('expected an indented block', ('', 28, 2, 'if count==0:\n'))
如何修复错误?
答案 0 :(得分:3)
您可以使用defaultdict为每个职业提供所有工资,然后获得中位数。
import csv
from collections import defaultdict
with open("C:/Users/jimenez/Desktop/a.csv","r") as f:
d = defaultdict(list)
reader = csv.reader(f)
reader.next()
for row in reader:
d[row[1]].append(float(row[2]))
for k,v in d.iteritems():
print "{} median is {}".format(k,sorted(v)[len(v) // 2])
print "{} average is {}".format(k,sum(v)/len(v))
输出
Plumbers median is 500.0
Plumbers average is 475.0
Lawyers median is 700.0
Lawyers average is 628.571428571
Dog catchers median is 400.0
Dog catchers average is 400.0
Doctors median is 800.0
Doctors average is 787.5
答案 1 :(得分:1)
如果您使用pandas
(http://pandas.pydata.org):
import pandas as pd
df = pd.read_csv('test.csv', names=['City', 'Job', 'Salary'])
df.groupby('Job').median()
# Salary
# Job
# Doctors 800
# Dog catchers 400
# Lawyers 700
# Plumbers 450
如果你想要平均而不是中位数,
df.groupby('Job').mean()
# Salary
# Job
# Doctors 787.500000
# Dog catchers 400.000000
# Lawyers 628.571429
# Plumbers 475.000000
答案 2 :(得分:0)
如果你的问题是计算他的中位数,而不是在SQL数据库中插入所有内容并加扰它, 这是一个只读取所有行,将所有工资分组并从中获取中位数的问题 - 这会将您的百行级脚本减少到:
import csv
professions = {}
with open("sal.csv") as data:
for city, profession, salary in csv.reader(data):
professions.setdefault(profession.strip(), []).append(int(salary.strip()))
for profession, salaries in sorted(professions.items()):
print ("{}: {}".format(profession, sorted(salaries)[len(salaries//2)] ))
(给予或取“1”以从分类工资中获得正确的中位数)