熊猫追加/合并替换数据框中的现有值

时间:2019-10-19 01:27:13

标签: python-3.x pandas

我正在创建一个名为dfLostBusiness的新数据框,以将符合特定条件的订单包含在名为df的原始数据框中,因此被视为“失落的业务”。我在df上使用布尔索引,然后将结果附加到dfLostBusiness。我期望dfLostBusiness可以将所有被屏蔽的值相互附加,以产生一个dfLostBusiness。 1500行,就像我在SQL中输出的一样。相反,我觉得无论出于何种原因,每个屏蔽命令都会替换O,X以外的所有值。我也尝试过使用遮罩命令的顺序。我正在使用一个ipython环境,该环境已多次重新启动,但没有不同的结果,因此一定有一些我不了解的事情正在发生。

使用附加:

dfLostBusiness = pd.DataFrame()
m = (df['OrderType'].str.lower() == 'o') & (df['OrderStatus'].str.lower() == 'x')
dfLostBusiness = df[m].reset_index(drop=True)
dfLostBusiness[['OrderType', 'OrderStatus']].shape: (421, 2)
dfLostBusiness Preview:
  OrderType OrderStatus
0         O           X
1         O           X
2         O           X
3         O           X
4         O           X

m = (df['OrderType'].str.lower() == 'c')
dfLostBusiness.append(df[m], ignore_index=True)
dfLostBusiness[['OrderType', 'OrderStatus']].shape: (594, 2)
dfLostBusiness Preview:
  OrderType OrderStatus
0         O           X
1         O           X
2         C           S
3         C           S
4         C           C
m = ((df['OrderType'].str.lower() == 'q') &  ((datetime.datetime.now() - df['OrderDate']) > pd.Timedelta(30, 'D')))
dfLostBusiness.append(df[m], ignore_index=True)
dfLostBusiness[['OrderType', 'OrderStatus']].shape: (1442, 2)
At this point, dfLostBusiness[dfLostBusiness['OrderType'].str.lower() == 'c'] outputs an EmptyDataframe 
dfLostBusiness Preview:
  OrderType OrderStatus
0         O           X
1         O           X
2         Q           X
3         Q           X
4         Q           Q
m = ((df['OrderType'].str.lower() == 'q') & (df['OrderStatus'].str.lower() == 'r'))
dfLostBusiness.append(df[m], ignore_index=True)
dfLostBusiness[['OrderType', 'OrderStatus']].shape: (425, 2)
Here the rows drop to 425 from 1442, and there are only O,X and Q,R
dfLostBusiness Preview:
  OrderType OrderStatus
0         O           X
1         O           X
2         O           X
3         O           X
4         Q           R

使用concat,我得到了类似的意外结果:

dfLostBusiness = pd.DataFrame()
m = (df['OrderType'].str.lower() == 'o') & (df['OrderStatus'].str.lower() == 'x')
dfLostBusiness = df[m].reset_index(drop = True)
m = (df['OrderType'].str.lower() == 'c')
pd.concat([dfLostBusiness, df[m]], ignore_index = True)
m = ((df['OrderType'].str.lower() == 'q') &  ((datetime.datetime.now() - df['OrderDate']) > pd.Timedelta(30, 'D')))
pd.concat([dfLostBusiness, df[m]], ignore_index = True)
m = ((df['OrderType'].str.lower() == 'q') & (df['OrderStatus'].str.lower() == 'r'))
pd.concat([dfLostBusiness, df[m]], ignore_index = True)

0 个答案:

没有答案