如果值= 0,如何填写列的缺失值?

时间:2019-05-20 11:39:36

标签: python pandas

我有一个由2个Excel文件组成的数据框。它们是相同数据的系统转储。我的脚本将两张纸相互比较,创建一个差异列,然后仅返回存在差异的行。 我的问题是我的数据框合并在“产品代码”上,这在两张纸上完全相同。但是,我还有另一列称为“描述”。每个工作表上都有一个不同的标题(我已经将它们重命名为已经匹配)。尽管产品代码匹配,但有时描述会稍有不同。我的df已加入产品代码中,但是又如何在描述中合并它,而无需为每个产品代码创建2个实例?

此刻,它正在向我返回工作表2(source2_df)的描述,但我遇到的问题是,如果工作表2中未出现任何东西,但它在工作表1中,我的脚本将返回产品代码,但描述显示为0

我需要一种方法来说明如果join_df'Description'= 0,然后返回source2_df描述

我尝试使用某种查找,并且有人还建议创建“缺少描述”,但是我什么也没做

import pandas as pd
import datetime
import os
path="C:\\Users\\*******\\PythonScripts"
os.chdir(path)

# gets todays date
today = datetime.date.today()

# read excel file named str(today) + 'STLRS_3PL.xlsx' with str(today) meaning format todays date as a string
# sheetname='Sheet1' means read sheet named 'Sheet1' and skiprows=4 means skip the first 4 rows of the sheet
source1_df = pd.read_excel(str(today) + ' STLRS_3PL.xlsx', sheet_name='Sheet1', skiprows=4)

# read excel file named str(today) + 'Hit Daily Report.xlsx' with str(today) meaning format todays date as a string
# sheetname='CURRENT STOCK' means read sheet named 'CURRENT STOCK' and skiprows=4 means skip the first 4 rows of the sheet
source2_df = pd.read_excel(str(today) + ' Hit Daily Report.xlsx', sheet_name='CURRENT STOCK')


# this assigns the column '   Available stock' of source1_df as the '   Available stock' column with commas removed
# and converted to a number from a string
source1_df['   Available stock']=source1_df['   Available stock'].str.replace(',', '').astype('int64')


#Create a new dataframe
# rows are grouped by productcode and numberoff is summed. 
# reset_index() I think is needed to allow the summed column to be named.
#stlr_df = source2_df.groupby('productcode')['description','numberoff'].agg('sum').reset_index()
stlr_df = source2_df.groupby(['productcode', 'description'])['numberoff'].agg('sum').reset_index()

# similar to above but for the hit dataframe
hit_df = source1_df.groupby('Material')['   Available stock'].agg('sum').reset_index()

# rename productcode column as 'Material' on  stiller_df
stlr_df.rename(columns={'productcode':'Material'}, inplace=True)


# This filters and drops any rows where the 'Material' field is not a number and drops from stiller_df
stlr_df = stiller_df.mask(~stlr_df["Material"].str.isnumeric()).dropna()
# format 'Material' column as a number instead of string
stlr_df['Material']=stlr_df['Material'].astype('int64')


# format '   Available stock' column as integer instead of a string
hit_df['   Available stock']=hit_df['   Available stock'].astype('int64')

# merge the two dataframe called stlr_df and hit_df using a outer join using the Material column
joined_df = pd.merge(stlr_df,hit_df,on='Material',how='outer')
joined_df = joined_df.fillna(0)

# convert joined_df column '   Available stock' as a float instead of a string
joined_df['   Available stock']=joined_df['   Available stock'].astype('float')
joined_df['difference']=joined_df['numberoff']-joined_df['   Available stock']

ind = joined_df.difference!=0
joined_df = joined_df[ind]
joined_df.columns=['Material','Description','ATMS','202','Difference']
joined_df.loc[joined_df.Description == '0', 'Description'] = source2_df['Material Description']


print (joined_df)

我尝试创建一种查找,如果该值为0


joined_df.loc[joined_df.Description == '0', 'Description'] = source2_df['Material Description']

这将返回KeyError,即使该列标题确实出现在Source2_df中。尽管我认为我在这里并不正确。

我对Python还是很陌生,所以如果这让任何人都变得脸色苍白,我深表歉意。

如果需要,我可以提供参考文件

预先感谢

编辑

我已经看过类似的答案,但找不到任何专门解决合并2个数据帧和稍有不同数据的问题的问题。

0 个答案:

没有答案