VBA与熊猫-在将VBA逻辑与熊猫匹配时遇到问题

时间:2019-02-22 23:51:06

标签: python excel vba pandas

您好,请在下面查看有效的VBA代码。我正在尝试在Pandas中重写它,但是我的Pandas脚本无法正常运行(我的Pandas尝试脚本位于VBA之下)有人可以帮助我完成此操作吗(我认为是这样)

Sub mymacro()
Columns(19).Replace "DFHD", "SFD"
Columns(19).Replace "DFBG", "SFD"
Columns(19).Replace "DFVD", "SFD"
Columns(19).Replace "MFUB", "BFD"
Columns(19).Replace "MFBD", "BFD"
Columns(19).Replace "DFBD", "BFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "UFNC", "CFD"
Columns(19).Replace "BFYD", "BFD"
'Having trouble starting below here'
Columns("T:AC").Select
    Selection.EntireColumn.Hidden = True
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:=Array( _
        "U*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
        "350", "B*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-100
    Range("S3").Select
    ActiveCell.FormulaR1C1 = "BD"
    Range("S3").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S3").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="=UND", Operator:=xlOr, Criteria2:="=UNH"
    ActiveWindow.SmallScroll Down:=-21
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=30, Criteria1:=Array( _
     "DR9", "DV0", "DV5", "DV8", "DV9", "DVG", "DV*"), Operator:=xlFilterValues
    ActiveWindow.SmallScroll Down:=-36
    Range("S11").Select
    ActiveCell.FormulaR1C1 = "SD"
    Range("S11").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S11").Select
    Application.CutCopyMode = False
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-10
    ActiveSheet.Range("$A$1:$AS$1000000").AutoFilter Field:=19, Criteria1:="UNH"
    ActiveWindow.SmallScroll Down:=-27
    Range("S1815").Select
    ActiveCell.FormulaR1C1 = "FUHD"
    Range("S1815").Select
    Selection.Copy
    Range(Selection, Selection.End(xlDown)).Select
    ActiveSheet.Paste
    Range("S1815").Select
    Application.CutCopyMode = False
    ActiveWindow.SmallScroll Down:=-30
    ActiveSheet.ShowAllData
    ActiveWindow.SmallScroll Down:=-240

下面是我的Pandas脚本,请注意我开始遇到麻烦的地方,因为前12行代码很好用。

import pandas as pd
import numpy as np
data = pd.read_excel("orsthrufirstarticledeltion.xlsx", encoding = "ISO-8859-1", dtype=object)
data.loc[data.Format == 'DFHD', 'Format'] = 'SFD'
data.loc[data.Format == 'DFBG', 'Format'] = 'SFD'
data.loc[data.Format == 'DFVD', 'Format'] = 'SFD'
data.loc[data.Format == 'MFUB', 'Format'] = 'BFD'
data.loc[data.Format == 'MFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'DFBD', 'Format'] = 'BFD'
data.loc[data.Format == 'UFNC', 'Format'] = 'CFD'
data.loc[data.Format == 'BFYD', 'Format'] = 'BFD'

# Trouble starts below
data.loc[(data["Fmt"] != str) & (data["Format"] == "UN*"), "Format"] = 'BD' # the UN* did not work 
#data.loc[(data["Fmt"] == '350') & (data["Format"] == "UNB"), "Format"] = 'BD'
#data.loc[(data["Fmt"] != str) & (data[data.Format.str.startswith('UN',na=False)]), "Format"] = 'BD'
#
writer = pd.ExcelWriter('mstrplc2.xlsx', engine='xlsxwriter')
data.to_excel(writer, sheet_name='Sheet1')
writer.save()

-----获得解决方案的新尝试---------

下面请查看示例数据帧以及我们将要开始使用的原始数据,如果您愿意,我可以将代码导出到excel中。

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})



writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
    startdf.to_excel(writer, sheet_name='Sheet1')

第一步是获取A列中的所有值,并用下面列出的新值替换现有值(因此只需编辑A列)

  • “ DFHD”->“ SFD”“ DFBG”->“ SFD”“ DFVD”->“ SFD”“ MFUB”->“ BFD”“ MFBD”-> “ BFD”“ DFBD”->“ BFD”“ UFNC”->“ CFD”“ UFNC”->“ CFD”“ BFYD”->“ BFD”

以这种逻辑编写之后,数据应如下所示:

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

现在,我们将继续仅编辑A列,但是使用B列中的值来指示A列中的值,因此请逐行考虑每个值。首先从列A过滤掉SFD,BFD和CFD,因此剩余的值将是“ UNFZ”,“ UNT”,“ UNIX”,“ UNFZ”,“ UNT”,“ UNIX”,“ UNFZ”,“ UNT”, 'UNIX','UNFZ','UNT','UNIX','UNFZ','UNT','UNIX'。对于这些剩余值,我们将查看B列以确定如何更改A列中的内容。以下逻辑:

  1. 以B开头或B列中的数字的值应表示A列中匹配的行值现在应更改为BFD
  2. B列中以D或OPT开头的值应表示A列中匹配的行值现在应更改为SFD
  3. 以U开头或B列中的数字的值应表示A列中匹配的行值现在应更改为UHFD

按照此逻辑,最终输出数据帧应为

     resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                     'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
    'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

writer = pd.ExcelWriter('finalresult.xlsx', engine='xlsxwriter')
        resultdf.to_excel(writer, sheet_name='Sheet1')

2 个答案:

答案 0 :(得分:0)

现在,您的条件过滤器正在针对“格式”列查找文字“ UN *”。要将星号用作通配符,可以使用fnmatch模块。

import fnmatch

data.loc[(data["Fmt"] != str) & (data["Format"].apply(lambda x: fnmatch.fnmatch(x, 'UN*')), "Format"] = 'BD'

答案 1 :(得分:0)

仍然存在一个问题,当将其用于来自excel的实时数据时,我的Column_B作为“对象”导入到数据帧中,其中主要包含字符串,但包含一些数字值,例如“ 350”以及逻辑对于所说的int值不起作用...是什么原因?

使其与以下代码配合使用: data.loc[data.Fmt .astype(str) == '350', 'Fm'] = 'test' 全部,下面是一个似乎有效的答案,(每行的顺序很重要)

但是有没有更多的Python方式来实现这一点,即使用通配符?上面借出的通配符解决方案答案不起作用,因此请在下面查看冗长的解决方案:

import pandas as pd

startdf = pd.DataFrame({'Column_A':['DFHD', 'DFBG', 'DFVD', 'MFUB', 'MFBD', 'DFBD', 'UFNC', 'UFNC', 'BFYD',
                                    'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],

'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})
#writer = pd.ExcelWriter('testdf.xlsx', engine='xlsxwriter')
#df.to_excel(writer, sheet_name='Sheet1')
#writer.save()

df2 = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                            'UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX', 'UNFZ', 'UNT', 'UNIX','UNFZ', 'UNT', 'UNIX'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})


resultdf = pd.DataFrame({'Column_A':['SFD', 'SFD', 'SFD', 'BFD', 'BFD', 'BFD', 'CFD', 'CFD', 'BFD',
                                 'BFD', 'SFD', 'SFD', 'SFD', 'SFD', 'BFD','UHFD', 'UHFD', 'BFD', 'BFD', 'BFD', 'SFD','BFD', 'UHFD', 'SFD'],
'Column_B':['test','test','test','test','test','test','test','test','test','B50','DVG','DV9','DV5','DV0','B25','U66','U1C','350','357','BVG','DBG','BUG','UVG','DV8']})

test = startdf

test.loc[test.Column_A == 'DFHD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFBG', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'DFVD', 'Column_A'] = 'SFD'
test.loc[test.Column_A == 'MFUB', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'MFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'DFBD', 'Column_A'] = 'BFD'
test.loc[test.Column_A == 'UFNC', 'Column_A'] = 'CFD'
test.loc[test.Column_A == 'BFYD', 'Column_A'] = 'BFD'

test.loc[test.Column_B == '357', 'Column_A'] = 'BFD'
test.loc[test.Column_B == '350', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B50', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'B25', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BVG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'BUG', 'Column_A'] = 'BFD'
test.loc[test.Column_B == 'DVG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV9', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV5', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV8', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DV0', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'DBG', 'Column_A'] = 'SFD'
test.loc[test.Column_B == 'U66', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'U1C', 'Column_A'] = 'UHFD'
test.loc[test.Column_B == 'UVG', 'Column_A'] = 'UHFD'

finaldf = test