使用python更新excel电子表格

时间:2016-01-04 11:23:24

标签: python excel pandas

我正在跟踪各种仪器的相当大的库存数据库。我需要一个更新所述库存系统的好方法。该系统由许多电子表格组成,基本上每个仪器一个。我一直在使用的主要组织方法是仪器和部件号。截至目前,我有一个脚本 - 使用pandas包 - 将使用电子表格引用两个类别的主文件:部件号和仪器,并将通过删除重复项来更新主数据。例如,如果我有四个5欧姆电阻器,并且该数字更新为七个5欧姆电阻器,我运行程序并使用新值7更新主控器。

我现在需要做的是完全删除遗漏。换句话说,我从四个5欧姆电阻器到零5欧姆电阻器,也就是说,根本没有进入。我需要一种程序来编辑主文件并完全删除该条目。我还想要一种方法,能够使用用户输入的x个文件来引用master,而不是一次只能引用一个。但是我不太确定我是否已经足够精通python或pandas来实现这一点,因此关于堆栈溢出的问题!

任何想法或建议表示赞赏!这是迄今为止的计划:

import subprocess
import pandas as pd
import numpy as np
import os, sys
from os.path import basename

# CSV IMPORT DEFINED FUNCTION
def csvImport(ftype, fpath):
    try:
       if ftype == 1:
           masterdata = pd.read_csv(fpath)
           return masterdata

       if ftype == 2:
           updateddata = pd.read_csv(fpath)
           updateddata['originfile'] = pd.Series(os.path.basename(fpath), \
                                                 index=updateddata.index)             
           return updateddata

    except Exception as e:
       print "\nUnable to import CSV file. Error {}".format(e)
       sys.exit(1)

# EXCEL IMPORT DEFINED FUNCTION
def xlImport(ftype, fpath):
    try:
        if ftype == 1:
           masterdata = pd.read_excel(fpath, 0)
           return masterdata

        if ftype == 2:
           updateddata = pd.read_excel(fpath, 0)
           updateddata['orginfile'] = pd.Series(os.path.basename(fpath), \
                                                index=updateddata.index)             
           return updateddata

    except Exception as e:
       print "\nUnable to import Excel file. Error {}".format(e)
       sys.exit(1)

# MASTER FILE USER INPUT DEFINED FUNCTION
def masterfile():
    while True:
       masterfile = raw_input("Enter the path to the master file: ")    
       if masterfile.endswith(".csv"):
          return csvImport(1, masterfile)
          break
       elif masterfile.endswith(".xlsx"):
          return xlImport(1, masterfile)          
          break
       else:
          print "\nPlease enter a proper CSV format file."

# UPDATED FILE USER INPUT DEFINED FUNCTION
def updatefile():
    while True:       
       updatedfile = raw_input("\nEnter the path to the updated file: ")
       if updatedfile.endswith(".csv"):
          return csvImport(2, updatedfile)
          break
       elif updatedfile.endswith(".xlsx"):
          return xlImport(2, updatedfile)
          break
       else:
          print "\nPlease enter a proper Excel file in xlsx format."

# CALLING OPENING FUNCTIONS
masterdata = masterfile()
updateddata = updatefile()

# CONCATENATING DATA FRAMES
combineddata = pd.concat([updateddata, masterdata])

# REMOVING DUPLICATES
finaldata = combineddata.drop_duplicates(['Item'])

# SETTING FINAL PATH BY USER INPUT
while True:       
    final = raw_input("\nWhere do you want the file, and what do you want to name it? \
                      (e.g., C:\path_to_file\name_of_file.xlsx): ")
    if final.endswith(".xlsx"):
        break
    else:
        print "\nPlease enter a proper Excel file in xlsx format."

# OUTPUTTING DATA FRAME TO FILE 
finaldata.to_excel(final)
print "\nSuccessfully outputted appended data frame to Excel!"

# OPENING OUTPUTTED FILE
# (NOTE: PYTHON STILL RUNS UNTIL SPREADSHEET IS CLOSED)
subprocess.call(final, shell=True)

1 个答案:

答案 0 :(得分:0)

第一个问题的想法 - 这些行为类似于sql select声明:

nozeros_finaldata = finaldata[finaldata['ColumnName'] != 0]

'ColumnName'替换为列的名称从4变为零;它将返回一个新的数据帧。然后使用nozeros_finaldata.to_excel(final)

对于第二个问题:您可以使用while循环并询问用户是否有更多文件。

# CALLING OPENING FUNCTIONS
more_files = True
while more_files:
    masterdata = masterfile()
    updateddata = updatefile()

    # CONCATENATING DATA FRAMES
    combineddata = pd.concat([updateddata, masterdata])

    # REMOVING DUPLICATES
    finaldata = combineddata.drop_duplicates(['Item'])
    finaldata.dropna(subset=['originfile'],inplace=True)

    # SETTING FINAL PATH BY USER INPUT
    while True:       
        final = raw_input("\nWhere do you want the file, and what do you want to name it? \
                      (e.g., C:\path_to_file\name_of_file.xlsx): ")
        if final.endswith(".xlsx"):
            break
        else:
            print "\nPlease enter a proper Excel file in xlsx format."

    # OUTPUTTING DATA FRAME TO FILE 
    finaldata.to_excel(final)
    print "\nSuccessfully outputted appended data frame to Excel!"

    # OPENING OUTPUTTED FILE
    # (NOTE: PYTHON STILL RUNS UNTIL SPREADSHEET IS CLOSED)
    subprocess.call(final, shell=True)

    user_run_again = raw_input("\nWould you like to run another file?" )
    if user_run_again == "Yes":
        more_files = True
    else:
        more_files = False

您可能希望对最后一个raw_input进行一些异常处理,只是尝试提供一些想法。