根据其他数据框值创建列

时间:2015-11-02 04:47:35

标签: python datetime numpy pandas dataframe

import pandas as pd
import io
import numpy as np
import datetime

data = """
    date          id
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data), delimiter='\s+', index_col=False, parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)

我有这个我正在使用的例子。 df2['Check']正在做的是df['date'] == 'October'然后我分配0,否则为1。

np.where可以正常使用其他条件,但strftime不喜欢导致此错误的系列:

Traceback (most recent call last):
  File "C:/Users/Leb/Desktop/Python/test2.py", line 22, in <module>
    df2['Check'] = np.where(datetime.datetime.strftime(df['date'],'%B')=='October',0,1)
TypeError: descriptor 'strftime' requires a 'datetime.date' object but received a 'Series'

如果我循环,我的实际数据需要很长时间,大约是1M。我怎样才能有效地做到这一点?

df2['Check']应如下所示:

  Check
0     0
1     0
2     0
3     0
4     1
5     1
6     1
7     1

2 个答案:

答案 0 :(得分:3)

这是一个稍微简单的版本,使用month对象的datetime属性。如果它等于10,只需将true / false值映射到您想要的0/1对:

df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})

答案 1 :(得分:0)

@ ako的回答是关于钱的,但基于@ Kartik和@ EdChum的评论,这是我想出的:

import pandas as pd
import io
import numpy as np

data = """
    2015-10-31    50230
    2015-10-31    48646
    2015-10-31    48748
    2015-10-31    46992
    2015-11-01    46491
    2015-11-01    45347
    2015-11-01    45681
    2015-11-01    46430
    """

df = pd.read_csv(io.StringIO(data*125000), delimiter='\s+', index_col=False, names=['date','id'], parse_dates = ['date'])

df2 = pd.DataFrame(index=df.index)

df.shape
(1125000, 2)

%timeit df2['Check']=df.date.apply(lambda x: x.month==10).map({True:0,False:1})
1 loops, best of 3: 2.56 s per loop

%timeit df2['date'] = np.where(df['date'].dt.month==10,0,1)
10 loops, best of 3: 80.5 ms per loop