我正在跟踪各种仪器的相当大的库存数据库。我需要一个更新所述库存系统的好方法。该系统由许多电子表格组成,基本上每个仪器一个。我一直在使用的主要组织方法是仪器和部件号。截至目前,我有一个脚本 - 使用pandas包 - 将使用电子表格引用两个类别的主文件:部件号和仪器,并将通过删除重复项来更新主数据。例如,如果我有四个5欧姆电阻器,并且该数字更新为七个5欧姆电阻器,我运行程序并使用新值7更新主控器。
我现在需要做的是完全删除遗漏。换句话说,我从四个5欧姆电阻器到零5欧姆电阻器,也就是说,根本没有进入。我需要一种程序来编辑主文件并完全删除该条目。我还想要一种方法,能够使用用户输入的x个文件来引用master,而不是一次只能引用一个。但是我不太确定我是否已经足够精通python或pandas来实现这一点,因此关于堆栈溢出的问题!
任何想法或建议表示赞赏!这是迄今为止的计划:
import subprocess
import pandas as pd
import numpy as np
import os, sys
from os.path import basename
# CSV IMPORT DEFINED FUNCTION
def csvImport(ftype, fpath):
try:
if ftype == 1:
masterdata = pd.read_csv(fpath)
return masterdata
if ftype == 2:
updateddata = pd.read_csv(fpath)
updateddata['originfile'] = pd.Series(os.path.basename(fpath), \
index=updateddata.index)
return updateddata
except Exception as e:
print "\nUnable to import CSV file. Error {}".format(e)
sys.exit(1)
# EXCEL IMPORT DEFINED FUNCTION
def xlImport(ftype, fpath):
try:
if ftype == 1:
masterdata = pd.read_excel(fpath, 0)
return masterdata
if ftype == 2:
updateddata = pd.read_excel(fpath, 0)
updateddata['orginfile'] = pd.Series(os.path.basename(fpath), \
index=updateddata.index)
return updateddata
except Exception as e:
print "\nUnable to import Excel file. Error {}".format(e)
sys.exit(1)
# MASTER FILE USER INPUT DEFINED FUNCTION
def masterfile():
while True:
masterfile = raw_input("Enter the path to the master file: ")
if masterfile.endswith(".csv"):
return csvImport(1, masterfile)
break
elif masterfile.endswith(".xlsx"):
return xlImport(1, masterfile)
break
else:
print "\nPlease enter a proper CSV format file."
# UPDATED FILE USER INPUT DEFINED FUNCTION
def updatefile():
while True:
updatedfile = raw_input("\nEnter the path to the updated file: ")
if updatedfile.endswith(".csv"):
return csvImport(2, updatedfile)
break
elif updatedfile.endswith(".xlsx"):
return xlImport(2, updatedfile)
break
else:
print "\nPlease enter a proper Excel file in xlsx format."
# CALLING OPENING FUNCTIONS
masterdata = masterfile()
updateddata = updatefile()
# CONCATENATING DATA FRAMES
combineddata = pd.concat([updateddata, masterdata])
# REMOVING DUPLICATES
finaldata = combineddata.drop_duplicates(['Item'])
# SETTING FINAL PATH BY USER INPUT
while True:
final = raw_input("\nWhere do you want the file, and what do you want to name it? \
(e.g., C:\path_to_file\name_of_file.xlsx): ")
if final.endswith(".xlsx"):
break
else:
print "\nPlease enter a proper Excel file in xlsx format."
# OUTPUTTING DATA FRAME TO FILE
finaldata.to_excel(final)
print "\nSuccessfully outputted appended data frame to Excel!"
# OPENING OUTPUTTED FILE
# (NOTE: PYTHON STILL RUNS UNTIL SPREADSHEET IS CLOSED)
subprocess.call(final, shell=True)
答案 0 :(得分:0)
第一个问题的想法 - 这些行为类似于sql select
声明:
nozeros_finaldata = finaldata[finaldata['ColumnName'] != 0]
将'ColumnName'
替换为列的名称从4变为零;它将返回一个新的数据帧。然后使用nozeros_finaldata.to_excel(final)
对于第二个问题:您可以使用while循环并询问用户是否有更多文件。
# CALLING OPENING FUNCTIONS
more_files = True
while more_files:
masterdata = masterfile()
updateddata = updatefile()
# CONCATENATING DATA FRAMES
combineddata = pd.concat([updateddata, masterdata])
# REMOVING DUPLICATES
finaldata = combineddata.drop_duplicates(['Item'])
finaldata.dropna(subset=['originfile'],inplace=True)
# SETTING FINAL PATH BY USER INPUT
while True:
final = raw_input("\nWhere do you want the file, and what do you want to name it? \
(e.g., C:\path_to_file\name_of_file.xlsx): ")
if final.endswith(".xlsx"):
break
else:
print "\nPlease enter a proper Excel file in xlsx format."
# OUTPUTTING DATA FRAME TO FILE
finaldata.to_excel(final)
print "\nSuccessfully outputted appended data frame to Excel!"
# OPENING OUTPUTTED FILE
# (NOTE: PYTHON STILL RUNS UNTIL SPREADSHEET IS CLOSED)
subprocess.call(final, shell=True)
user_run_again = raw_input("\nWould you like to run another file?" )
if user_run_again == "Yes":
more_files = True
else:
more_files = False
您可能希望对最后一个raw_input进行一些异常处理,只是尝试提供一些想法。