使用文件中的数字查询数据集

时间:2018-11-04 15:29:36

标签: vbscript

我试图使用VBScript仅使用标识号来查询包含人员数据列表的文件。目前,我有一个包含所有人员数据的数据集文件和一个查询文件,该文件具有要查询该数据集结果的ID号。查询结果匹配后,我想将行输出到结果文件。

这是数据集文件和查询文件中包含的数据的一般示例。

数据集:

ID,Name,City,State,Zipcode,Phone
1885529946,Hall,Louisville,KY,40208,5026366683
1886910320,Brown,Sacramento,CA,95814,5302981550
1953250581,Rios,Sterling,OK,73567,5803658077
1604767393,Barner,Irvine,CA,92714,9494768597
1713746771,Herrera,Stotts City,MO,65756,4172852393
1022686106,Moore,Ocala,FL,34471,3526032811
1579121274,Beyer,Alexandria,MD,22304,3013838430
1288569655,Rondeau,Augusta,GA,30904,7066671404
1954615404,Angel,Los Angeles,CA,90014,5622961806
1408747874,Lagasse,Traverse City,MI,49686,2318182792

查询文件:

1885529946
1713746771
1408747874

我能够读取查询文件中的所有行并使用import org.junit.jupiter.api.Test; import org.junit.jupiter.api.extension.BeforeAllCallback; import org.junit.jupiter.api.extension.BeforeTestExecutionCallback; import org.junit.jupiter.api.extension.ExtendWith; import org.junit.jupiter.api.extension.ExtensionContext; import org.junit.jupiter.api.extension.ExtensionContext.Namespace; @ExtendWith({ Extension1.class, Extension2.class }) public class Tests { @Test void test() { // executing this results in the following being printed to SYS_OUT. // PROJECT_ID=112 } } class Extension1 implements BeforeAllCallback { public static final String PROJECT_ID = Extension1.class.getName() + ".PROJECT_ID"; @Override public void beforeAll(ExtensionContext context) throws Exception { context.getStore(Namespace.GLOBAL).put(PROJECT_ID, "112"); } } class Extension2 implements BeforeTestExecutionCallback { @Override public void beforeTestExecution(ExtensionContext context) throws Exception { System.out.println("PROJECT_ID=" + context.getStore(Namespace.GLOBAL).get(Extension1.PROJECT_ID)); } } 显示ID号。不会产生错误,脚本不会结束,也不会生成结果文件。结果文件应仅包含来自数据集的与ID号匹配的行。例如:

1885529946,Hall,Louisville,KY,40208,5026366683
1713746771,Herrera,Stotts City,MO,65756,4172852393
1408747874,Lagasse,Traverse City,MI,49686,2318182792

这是我尝试使用的脚本:

WScript.Echo

2 个答案:

答案 0 :(得分:3)

您的代码中的问题是那些文件作为 streams 打开。一旦到达此类流的末尾(即.AtEndOfStream变为true,例如在反复调用.ReadLine()之后),它就不会神奇地倒回到文件的开头。您的“嵌套循环”方法需要倒回查询文件才能正常运行。

这可以通过关闭并重新打开流来实现,但是效率不是很高。将 all 数字与输入文件中的每一行进行比较也不是很有效。我建议您使用Dictionary对象将数字存储在查询文件中。字典存储键值对,并针对快速键查找(通过.Exists(someKey)进行了优化),因此它们非常适合此任务。

这样,您可以非常快速地确定是否应将一行写入输出文件:

Const intForReading = 1
Const intForWriting = 2
Const intForAppending = 8

strQueryFile = "C:\numbers_test.txt"
strDataSetFile = "C:\data_test.csv"
strOutputFile = "C:\results_test.csv"

Set objFSO = CreateObject("Scripting.FileSystemObject")

' first import the query file into a dictionary for easy lookup
Set numbers = CreateObject("Scripting.Dictionary")    
With objFSO.OpenTextFile(strQueryFile, intForReading)
    Do Until .AtEndOfStream
        ' we are only interested in the key for this task, the value is completely irrelevant.
        numbers.Add .ReadLine(), ""
    Loop
    .Close
End With

Set objFileToWrite = objFSO.OpenTextFile(strOutputFile, intForWriting, true)    
With objFSO.OpenTextFile(strDataSetFile, intForReading)
    Do Until .AtEndOfStream
        line = .ReadLine()
        columns = Split(line, ",")
        currentNumber = columns(0)
        If numbers.Exists(currentNumber) Then objFileToWrite.WriteLine(line)
    Loop
    .Close
End With

objFileToWrite.Close

答案 1 :(得分:2)

我喜欢将ADODB用于此类任务,并将输入文件视为数据库。技巧通常是为您的系统找到合适的connection string,并在必要时使用Schema.ini file

option explicit

Const adClipString = 2

dim ado: set ado = CreateObject("ADODB.Connection")
' data files are in this folder
' using the old JET driver
ado.ConnectionString = "Provider=Microsoft.Jet.OLEDB.4.0;Data Source=.\;Extended Properties=""text;HDR=Yes;FMT=Delimited"";"
' or maybe use ACE if installed
' ado.ConnectionString = "Driver=Microsoft Access Text Driver (*.txt, *.csv);Dbq=.\;Extensions=asc,csv,tab,txt;"
ado.open

' query is in a CSV too, so we can access as a table
' the column names are given in Schema.ini
const QUERY = "SELECT * FROM [data_test.csv] WHERE ID IN (SELECT ID FROM [query_test.csv])"
' or literals 
' const QUERY = "SELECT * FROM [data_test.csv] WHERE ID IN ('1885529946', '1713746771', '1408747874')"

dim rs: set rs = ado.Execute(QUERY)

' convenient GetString() method allows formatting the result
' this could be written to file instead of outputting to console
WScript.Echo rs.GetString(adClipString, , vbTab, vbNewLine, "[NULL]")

'or create a new table!
'delete results table if exists
' catch an error if the table does not exist
on error resume next
' for some reason you need to use #csv not .csv here
ado.Execute "DROP TABLE result#csv"
if err then
    WScript.Echo err.description
end if
on error goto 0

ado.Execute("SELECT * INTO [result.csv] FROM [data_test.csv] WHERE ID IN (SELECT ID FROM [query_test.csv])")

rs.close
ado.close

Schema.ini文件

[data_test.csv]
Format=CSVDelimited
ColNameHeader=True

Col1=ID Text
Col2=Name Text
Col3=City Text 
Col4=Zipcode Text
Col5=Phone Text


[query_test.csv]
Format=CSVDelimited
ColNameHeader=False 

Col1=ID Text