熊猫基于旧列的新列,条件为无值

时间:2019-01-09 20:28:11

标签: python pandas numpy

我正在尝试根据熊猫中的现有专栏创建一个新专栏。现有列包含一年和季度。 IE:“ 201901”或为空白。如果原始列具有有效条目,则新列应包含完整的时间戳,否则为空。

IE:

  

输入
  201901
  201902
  无
  201901

     

所需的输出
  日期时间(2019,01,01)
  datetime(2019,03,01)
  无
  日期时间(2019,01,01)

我的尝试


    df['stamp'] = np.where(df['quarter'].astype(str).str.len() == 8,\
       datetime( df['quarter'].astype(str).str[0:4].astype(int), \
                 df['quarter'].astype(str).str[4:6].astype(int)*3,1), \
    None)

结果:

ValueError: invalid literal for int() with base 10: ''

在我看来,条件的True分支的代码正在评估所有行。注意:条件是正确的,它可以正确识别有效条目。

2 个答案:

答案 0 :(得分:1)

pandas.to_datetime will parse quarters automatically, but it needs to be in the format 2019Q3, year followed by Q and the Quarter.

Since you have a column of integers with a None it's difficult to know if the underlying values are truly integers, or if they are cast to float, which could mess up the string slicing without first replacing.

import pandas as pd

s = df.stamp.astype(str).replace('\.0', '', regex=True) # Remove .replace if truly integer
pd.to_datetime(s.str[0:4] + 'Q' + s.str[-1], errors='coerce')

#0   2019-01-01
#1   2019-04-01
#2          NaT
#3   2019-01-01
#Name: stamp, dtype: datetime64[ns]

You get some garbage 'nanQn' or 'NoneQe' for the missing rows, but since it's going to become NaT anyway probably not a big deal.

答案 1 :(得分:0)

您能做点什么...

df['stamp'] = df['quarter'].replace('', np.nan, inplace=True)

然后进行其他计算?