我从XML中提取一个值并使用该值来检查它是否存在于PDF文件中:
我拥有的XML
<RealTimeLetter>
<Customer>
<RTLtr_Acct>0163426</RTLtr_Acct>
<RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>
<RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>
<RTLtr_CustomerAddress2 />
<RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>
<RTLtr_CustomerState>GA</RTLtr_CustomerState>
<RTLtr_CustomerZip>30188</RTLtr_CustomerZip>
<RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>
<RTLtr_Addressee>0</RTLtr_Addressee>
</Customer>
</RealTimeLetter>
PDF文件具有客户名称和地址
LSIH JHTWVZ
887 YPCLY THYZO SU
WOODSTOCK, GA 30188
我正在使用PDF阅读器和Nokogiri宝石从PDF中读取文本,从XML中提取客户名称并检查PDF中是否包含客户名称。
PDF阅读器被解析为
require 'pdf_reader'
def parse_pdf
PDF::Reader.new(@filename)
end
@reader = file('C:\Users\ecz560\Desktop\30004_Standard.pdf').parse_pdf
require 'nokogiri'
@xml = Nokogiri::XML(File.open('C:\Users\ecz560\Desktop\30004Standard.xml'))
@CustName = @xml.xpath("//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName").map(&:text).to_s
page_index = 0
@reader.pages.each do |page|
page_index = page_index+1
if expect(page.text).to include @CustName
valid_text = "Given text is present in -- #{page_index}"
puts valid_text
end
end
但我收到了一个错误:
RSpec::Expectations::ExpectationNotMetError: expected "LSIH JHTWVZ\n 887 YPCLY THYZO SU\n WOODSTOCK, GA 30188\n Page 1 of 1" to include "[\"LSIH JHTWVZ\"]"
Diff:
@@ -1,2 +1,80 @@
-["LSIH JHTWVZ"]
+ LSIH JHTWVZ
+ 887 YPCLY THYZO SU
+ WOODSTOCK, GA 30188
./features/step_definitions/Letters/Test1_Letters.rb:372:in `block (2 levels) in <top (required)>'
./features/step_definitions/Letters/Test1_Letters.rb:370:in `each'
./features/step_definitions/Letters/Test1_Letters.rb:370:in `/^I validate the PDF content$/'
C:\Users\ecz560\Documents\GitHub\ATDD Local\features\FeatureFiles\Letters\Test1_Letters.feature:72:in `Then I validate the PDF content'
理解问题与我比较@Custname的方式有关。 我该如何解决这个问题?
答案 0 :(得分:0)
if expect(page.text).to include @CustName
expect
不是这样使用的。
在测试中使用了一个期望,以验证您的代码是否正常运行。它不应该在普通代码中使用。
期望抛出异常并在失败时暂停所有代码。它不会返回true / false - 如果失败则无法继续 - 它会像在代码中那样抛出异常(正确),然后你的所有代码都会停止并赢得&t重新开始。
您可能想要做的只是检查这样的值:
if page.text.includes?(@CustName)
(注意:我没有对其进行错误测试......你可能需要google以正确的方式编写它并编写类似的实际工作方式。)
答案 1 :(得分:0)
我看到的一件事是您的XPath选择器无效。
//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName
测试它:
require 'nokogiri'
doc = Nokogiri::XML(<<EOT)
<RealTimeLetter>
<Customer>
<RTLtr_Acct>0163426</RTLtr_Acct>
<RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>
<RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>
<RTLtr_CustomerAddress2 />
<RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>
<RTLtr_CustomerState>GA</RTLtr_CustomerState>
<RTLtr_CustomerZip>30188</RTLtr_CustomerZip>
<RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>
<RTLtr_Addressee>0</RTLtr_Addressee>
</Customer>
</RealTimeLetter>
EOT
doc.search("//Customer[RTLtr_Loancust='0163426']//RTLtr_CustomerName").to_xml # => ""
使用一点修改找到<Customer>
节点:
doc.search('//Customer/RTLtr_Acct/text()[contains(., "0163426")]/../..').to_xml
# => "<Customer>\n <RTLtr_Acct>0163426</RTLtr_Acct>\n <RTLtr_CustomerName>LSIH JHTWVZ</RTLtr_CustomerName>\n <RTLtr_CustomerAddress1>887 YPCLY THYZO SU</RTLtr_CustomerAddress1>\n <RTLtr_CustomerAddress2/>\n <RTLtr_CustomerCity>WOODSTOCK,</RTLtr_CustomerCity>\n <RTLtr_CustomerState>GA</RTLtr_CustomerState>\n <RTLtr_CustomerZip>30188</RTLtr_CustomerZip>\n <RTLtr_ADAPreference>NONE</RTLtr_ADAPreference>\n <RTLtr_Addressee>0</RTLtr_Addressee>\n</Customer>"
此时,可以轻松地从<Customer>
内的元素中获取内容:
customer = doc.search('//Customer/RTLtr_Acct/text()[contains(., "0163426")]/../..')
customer.at('RTLtr_Acct').text # => "0163426"
customer.at('RTLtr_CustomerAddress1').text # => "887 YPCLY THYZO SU"