之前我使用过SQL和SAS,现在我最近转向使用Python。我遇到了很多问题。我仍然在研究它们,并且有一个新问题让我发疯:
import pandas
import numpy as np
data = {'Period':['2016-02','2016-02','2016-02','2016-02','2016-03','2016-03','2016-04'],
'Name':['a','b','c','c','d','e','v'],
'amount':[2,3,41,1,8,43,20],
'Credit':[5,2,45,50,9,21,32]}
如何获得与SQL相同的结果:
select *,case when Period = '2016-02' then (amount/Credit)*1.2
when Period = '2016-03' then (amount/Credit)*1.1
else (amount/Credit)*1.0 end as Rate from data
或者像SAS那样:
data data;
set data;
if Period = '2016-02' then rate=(amount/Credit)*1.2;
else if Period = '2016-03' then rate=(amount/Credit)*1.1;
else rate=(amount/Credit)*1.0;
run;
甚至更多:
select Period,min(amount,credit) from data group by Period;
答案 0 :(得分:0)
选项1 pandas
使用map
m = lambda x: 1.2 if x == '2016-02' else 1.1 if x == '2016-03' else 1.
df.amount / df.Credit * df.Period.map(m)
选项2 numpy
numpy.where
p = df.Period.values
multiplier = np.where(p == '2016-02', 1.2, np.where(p == '2016-03', 1.1, 1.))
df.amount / df.Credit * multiplier
所有收益
0 0.480000
1 1.800000
2 1.093333
3 0.024000
4 0.977778
5 2.252381
6 0.625000
dtype: float64
答案 1 :(得分:0)
a = pd.DataFrame.from_dict(data)
def func(name):
c = a[a['Name'] == name]
if c['Period'] == '2016-02':
c['rate'] = c['amount'] / c['Credit']*1.2
return c
elif c['Period'] == '2016-03':
c['rate'] = c['amount'] / c['Credit']*1.1
return c
else:
c['rate'] = c['amount'] / c['Credit']*1.0
return c
x = a['Name'].apply(func)
答案 2 :(得分:0)
第一种方式可能是最直接的方法,您只需迭代并从列表中获取每个值,然后将其添加到结果列表中。
import pprint # used here to output results nicely
data = {
'Period':['2016-02','2016-02','2016-02','2016-02','2016-03','2016-03','2016-04'],
'Name':['a','b','c','c','d','e','v'],
'amount':[2,3,41,1,8,43,20],
'Credit':[5,2,45,50,9,21,32]
}
res = []
count = len(data['Period'])
for i in range(count):
period = data['Period'][i]
name = data['Name'][i]
amount = data['amount'][i]
credit = data['Credit'][i]
if period == '2016-02':
rate = amount / credit * 1.2
elif period == '2016-03':
rate = amount / credit * 1.1
else:
rate = amount / credit * 1.0
res.append( ( period, name, amount, credit, rate ) )
pprint.pprint(res)
更紧凑的解决方案是
res = [ [(p, n, a, c,
a/c * 1.2 if p == '2016-02' else a/c * 1.1 if p == '2016-03' else a/c * 1.0)]
for p, n, a, c in zip(data['Period'], data['Name'], data['amount'], data['Credit']) ]
pprint.pprint(res)
# Output
[[('2016-02', 'a', 2, 5, 0.48)],
[('2016-02', 'b', 3, 2, 1.7999999999999998)],
[('2016-02', 'c', 41, 45, 1.0933333333333333)],
[('2016-02', 'c', 1, 50, 0.024)],
[('2016-03', 'd', 8, 9, 0.9777777777777779)],
[('2016-03', 'e', 43, 21, 2.2523809523809524)],
[('2016-04', 'v', 20, 32, 0.625)]]
请注意,如果您想要numpy
解决方案,请查看piRSquared的answer或pandas
解决方案,请查看S Ringne的answer。< / p>