python pandas中需要的Assitance减少代码行和循环时间

时间:2017-03-03 12:23:53

标签: python pandas

我有一个DF,我在计算填充字段中的emi值

account Total   Start Date  End Date    EMI
211829  107000  05/19/17    01/22/19    5350
320563  175000  08/04/17    10/30/18    12500
648336  246000  02/26/17    08/25/19    8482.7586206897
109996  175000  11/23/17    11/27/19    7291.6666666667
121213  317000  09/07/17    04/12/18    45285.7142857143

然后根据日期范围我创建新的字段,如1月17日,2月17日,3月17日等,并填写下面的代码。

jant17 = pd.to_datetime('2017-01-01')
febt17 = pd.to_datetime('2017-02-01')
mart17 = pd.to_datetime('2017-03-01')

jan17 = pd.to_datetime('2017-01-31')
feb17 = pd.to_datetime('2017-02-28')
mar17 = pd.to_datetime('2017-03-31')

df.ix[(df['Start Date'] <= jan17) & (df['End Date'] >= jant17) , 'Jan17'] = df['EMI']

但缺点是当我必须做一个预测到2019年或2020年他们变得太多的代码行写,当有任何更新我需要修改太多的代码行。为了减少代码行,我尝试了一种使用for循环的替代方法,但代码开始需要很长时间才能执行。

monthend = { 'Jan17' : pd.to_datetime('2017-01-31'),
            'Feb17' : pd.to_datetime('2017-02-28'),
            'Mar17' : pd.to_datetime('2017-03-31')}

monthbeg = { 'Jant17' : pd.to_datetime('2017-01-01'),
            'Febt17' : pd.to_datetime('2017-02-01'),
            'Mart17' : pd.to_datetime('2017-03-01')}

for mend in monthend.values():
    for mbeg in monthbeg.values():
        for coln in colnames:
            df.ix[(df['Start Date'] <= mend) & (df['End Date'] >= mbeg) , coln] = df['EMI']

这大大减少了代码行的数量,但是从执行时间增加到3-4分钟到1小时。是否有更好的方法来使用更少的线和更少的处理时间对其进行编码

1 个答案:

答案 0 :(得分:3)

我认为您可以使用dfstart日期和end列创建帮助names,循环行并创建原始df的新列:

dates = pd.DataFrame({'start':pd.date_range('2017-01-01', freq='MS', periods=10),
                      'end':pd.date_range('2017-01-01', freq='M', periods=10)})
dates['names'] = dates.start.dt.strftime('%b%y')
print (dates)
         end      start  names
0 2017-01-31 2017-01-01  Jan17
1 2017-02-28 2017-02-01  Feb17
2 2017-03-31 2017-03-01  Mar17
3 2017-04-30 2017-04-01  Apr17
4 2017-05-31 2017-05-01  May17
5 2017-06-30 2017-06-01  Jun17
6 2017-07-31 2017-07-01  Jul17
7 2017-08-31 2017-08-01  Aug17
8 2017-09-30 2017-09-01  Sep17
9 2017-10-31 2017-10-01  Oct17

#if necessary convert to datetimes
df['Start Date'] = pd.to_datetime(df['Start Date'])
df['End Date'] = pd.to_datetime(df['End Date'])

def f(x):
    df.loc[(df['Start Date'] <= x.start) & (df['End Date'] >= x.end) , x.names] = df['EMI']
dates.apply(f, axis=1)
print (df)
   account   Total Start Date   End Date           EMI  Jan17  Feb17  \
0   211829  107000 2017-05-19 2019-01-22   5350.000000    NaN    NaN   
1   320563  175000 2017-08-04 2018-10-30  12500.000000    NaN    NaN   
2   648336  246000 2017-02-26 2019-08-25   8482.758621    NaN    NaN   
3   109996  175000 2017-11-23 2019-11-27   7291.666667    NaN    NaN   
4   121213  317000 2017-09-07 2018-04-12  45285.714286    NaN    NaN   

         Mar17        Apr17        May17        Jun17        Jul17  \
0          NaN          NaN          NaN  5350.000000  5350.000000   
1          NaN          NaN          NaN          NaN          NaN   
2  8482.758621  8482.758621  8482.758621  8482.758621  8482.758621   
3          NaN          NaN          NaN          NaN          NaN   
4          NaN          NaN          NaN          NaN          NaN   

         Aug17         Sep17         Oct17  
0  5350.000000   5350.000000   5350.000000  
1          NaN  12500.000000  12500.000000  
2  8482.758621   8482.758621   8482.758621  
3          NaN           NaN           NaN  
4          NaN           NaN  45285.714286