如何在python中实现大小写并在列

时间:2017-01-23 03:23:06

标签: python pandas

之前我使用过SQL和SAS,现在我最近转向使用Python。我遇到了很多问题。我仍然在研究它们,并且有一个新问题让我发疯:

import pandas
import numpy as np
data = {'Period':['2016-02','2016-02','2016-02','2016-02','2016-03','2016-03','2016-04'],
'Name':['a','b','c','c','d','e','v'],
'amount':[2,3,41,1,8,43,20],
'Credit':[5,2,45,50,9,21,32]}

如何获得与SQL相同的结果:

select *,case when Period = '2016-02' then (amount/Credit)*1.2
              when Period = '2016-03' then (amount/Credit)*1.1
         else (amount/Credit)*1.0 end as Rate from data

或者像SAS那样:

data data;
set data;
if Period = '2016-02' then rate=(amount/Credit)*1.2;
else if Period = '2016-03' then rate=(amount/Credit)*1.1;
else rate=(amount/Credit)*1.0;
run;

甚至更多:

select Period,min(amount,credit) from data group by Period;

3 个答案:

答案 0 :(得分:0)

选项1 pandas
使用map

m = lambda x: 1.2 if x == '2016-02' else 1.1 if x == '2016-03' else 1.
df.amount / df.Credit * df.Period.map(m)

选项2 numpy
numpy.where

p = df.Period.values
multiplier = np.where(p == '2016-02', 1.2, np.where(p == '2016-03', 1.1, 1.))
df.amount / df.Credit * multiplier

所有收益

0    0.480000
1    1.800000
2    1.093333
3    0.024000
4    0.977778
5    2.252381
6    0.625000
dtype: float64

答案 1 :(得分:0)

a = pd.DataFrame.from_dict(data)
def func(name):
c = a[a['Name'] == name]
if c['Period'] == '2016-02':
    c['rate'] = c['amount'] / c['Credit']*1.2
    return c
elif c['Period'] == '2016-03':
    c['rate'] = c['amount'] / c['Credit']*1.1
    return c 
else:
    c['rate'] = c['amount'] / c['Credit']*1.0
    return c

x = a['Name'].apply(func)

答案 2 :(得分:0)

第一种方式可能是最直接的方法,您只需迭代并从列表中获取每个值,然后将其添加到结果列表中。

import pprint # used here to output results nicely

data = {
    'Period':['2016-02','2016-02','2016-02','2016-02','2016-03','2016-03','2016-04'],
    'Name':['a','b','c','c','d','e','v'],
    'amount':[2,3,41,1,8,43,20],
    'Credit':[5,2,45,50,9,21,32]
}

res = []
count = len(data['Period'])

for i in range(count):

    period = data['Period'][i]
    name = data['Name'][i]
    amount = data['amount'][i]
    credit = data['Credit'][i]

    if period == '2016-02':
        rate = amount / credit * 1.2
    elif period == '2016-03':
        rate = amount / credit * 1.1
    else:
        rate = amount / credit * 1.0

    res.append( ( period, name, amount, credit, rate ) )

pprint.pprint(res)

更紧凑的解决方案是

res = [ [(p, n, a, c,
    a/c * 1.2 if p == '2016-02' else a/c * 1.1 if p == '2016-03' else a/c * 1.0)]
   for p, n, a, c in zip(data['Period'], data['Name'], data['amount'], data['Credit']) ]

pprint.pprint(res)

# Output
[[('2016-02', 'a', 2, 5, 0.48)],
 [('2016-02', 'b', 3, 2, 1.7999999999999998)],
 [('2016-02', 'c', 41, 45, 1.0933333333333333)],
 [('2016-02', 'c', 1, 50, 0.024)],
 [('2016-03', 'd', 8, 9, 0.9777777777777779)],
 [('2016-03', 'e', 43, 21, 2.2523809523809524)],
 [('2016-04', 'v', 20, 32, 0.625)]]

请注意,如果您想要numpy解决方案,请查看piRSquared的answerpandas解决方案,请查看S Ringne的answer。< / p>