背景
我不是程序员或技术人员 我有一个项目,我需要将大文本文件转换为访问数据库。 文本文件不是传统的平面文件格式,所以我需要一些帮助预处理。 文件很大(数百万条记录)介于100MB和1GB之间,似乎扼杀了我所尝试过的所有编辑器(字垫,记事本,vim,em编辑器)
以下是源文本文件的示例:
product/productId:B000H9LE4U
product/title: Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length
product/price: 22.14
review/userId: ABWHUEYK6JTPP
review/profileName: Robert Campbell
review/helpfulness: 0/0
review/score: 1.0
review/time: 1339113600review/summary: Either 1 or 5 Stars. Depends on how you look at it.
review/text: Either 1 or 5 Stars. Depends on how you look at it.1 Star because they sent 6 feet of 2" OD copper pipe.0 Star because they won't accept returns on it.5 stars because I figure it's actually worth $12-15/foot and since they won't take a return I figure I can sell it and make $40-50 on this deal
product/productId: B000LDNH8I
product/title: Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled
product/price: 84.99
review/userId: A19Y7ZIICAKM48
review/profileName: T Foley "computer guy"
review/helpfulness: 3/3
review/score: 5.0
review/time: 1248307200
review/summary: I recommend this Sling Psychrometer
review/text: Not too much to say. This instrument is well built, accurate (compared) to a known good source. It's easy to use, has great instructions if you haven't used one before and stores compactly.I compared prices before I purchased and this is a good value.
每一行代表产品的特定属性,从“product / productId:”
开始我需要什么
我需要通过剥离每个代码(即product / productId:,product / title:等等)并将其替换为@并替换该行,将此文件转换为字符分隔字段(我认为@符号工作)饲料。
我想删除评论/文字:行
输出如下:
B000H9LE4U@Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length@22.14@ABWHUEYK6JTPP@Robert Campbell@0/0@1.0@1339113600@Either 1 or 5 Stars. Depends on how you look at it.
B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A19Y7ZIICAKM48@T Foley "computer guy"@3/3@5.0@1248307200@I recommend this Sling Psychrometer
B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A3683PMJPFMAAS@Spencer L. Cullen@1/1@5.0@1335398400@A very useful tool
我现在有一个用“@”分隔的平面文件,我可以轻松导入到访问权限中。
抱歉絮絮叨叨。我愿意接受建议,但是不懂编程就能用编辑器语言编写。提前致谢
答案 0 :(得分:0)
我不确定我是否理解如何将文本文件映射到数据库字段。
这是你需要决定的第一件事。
完成后,我建议将您的文本文件放入与数据库列对应的列中。然后你应该能够将它导入Access。
答案 1 :(得分:0)
这是我刚刚放在一起的一种方法,它不能保证。它读取数据(您提供的样本),并根据需要以正确的格式显示。
Public Sub ReadFileAndSave(filePath As String, breakIdentity As String, Optional sepStr As String = "@")
'******************************************************************************
' Opens a large TXT File, reads the data until EOF on the Source,
' then reformats the data to be saved on the Destination
' Arguments:
' ``````````
' 1. The Source File Path - "C:\Users\SO\FileName.Txt" (or) D:\Data.txt
' 2. The element used to identify new row - "new row" (or) "-" (or) "sam"
' 3. (Optional) Separator - The separator, you wish to use. Defauls to '@'
'*******************************************************************************
Dim newFilePath As String, strIn As String, tmpStr As String, lineCtr As Long
'The Destination file is stored in the same drive with a suffix to the source file name
newFilePath = Replace(filePath, ".txt", "-ReFormatted.txt")
'Open the SOURCE file for Read.
Open filePath For Input As #1
'Open/Create the DESTINATION file for Write.
Open newFilePath For Output As #2
'Loop the SOURCE till the last line.
Do While Not EOF(1)
'Read one line at a time.
Line Input #1, strIn
'If it is a blank/empty line SKIP.
If Len(strIn) > 1 Then
lineCtr = lineCtr + 1
'Create a String of the same ID.
tmpStr = tmpStr & Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
'If a new row needs to be inserted, the BREAK IDENTITY is analyzed.
If InStr(strIn, breakIdentity) <> 0 And lineCtr > 1 Then
'Once the new row is triggered, dump the line in the Destination.
Print #2, Left(tmpStr, Len(tmpStr) - Len(Mid(strIn, InStr(strIn, ":") + 1)) - 1) & vbCrLf
'Prepare the NEXT ROW
tmpStr = Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
End If
End If
Loop
'Print the last line
Print #2, Left(tmpStr, Len(tmpStr) - 1) & vbCrLf
'Close the files.
Close #1
Close #2
End Sub
同样,这段代码适用于我的系统,而且我还没有测试过大部分内容,所以它可能会慢一点。如果这对您有效,请告诉我。