将大文本文件转换为数据库

时间:2015-04-08 19:25:49

标签: database vba replace ms-access-2010 file-conversion

背景

我不是程序员或技术人员 我有一个项目,我需要将大文本文件转换为访问数据库。 文本文件不是传统的平面文件格式,所以我需要一些帮助预处理。 文件很大(数百万条记录)介于100MB和1GB之间,似乎扼杀了我所尝试过的所有编辑器(字垫,记事本,vim,em编辑器)

以下是源文本文件的示例:

product/productId:B000H9LE4U

product/title: Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length

product/price: 22.14

review/userId: ABWHUEYK6JTPP

review/profileName: Robert Campbell

review/helpfulness: 0/0

review/score: 1.0

review/time: 1339113600review/summary: Either 1 or 5 Stars. Depends on how you look at it.

review/text: Either 1 or 5 Stars. Depends on how you look at it.1 Star because they sent 6 feet of 2" OD copper pipe.0 Star because they won't accept returns on it.5 stars because I figure it's actually worth $12-15/foot and since they won't take a return I figure I can sell it and make $40-50 on this deal

product/productId: B000LDNH8I

product/title: Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled

product/price: 84.99

review/userId: A19Y7ZIICAKM48

review/profileName: T Foley "computer guy"

review/helpfulness: 3/3

review/score: 5.0

review/time: 1248307200

review/summary: I recommend this Sling Psychrometer

review/text: Not too much to say. This instrument is well built, accurate (compared) to a known good source. It's easy to use, has great instructions if you haven't used one before and stores compactly.I compared prices before I purchased and this is a good value.

每一行代表产品的特定属性,从“product / productId:”

开始

我需要什么

我需要通过剥离每个代码(即product / productId:,product / title:等等)并将其替换为@并替换该行,将此文件转换为字符分隔字段(我认为@符号工作)饲料。

我想删除评论/文字:行

输出如下:

B000H9LE4U@Copper 122-H04 Hard Drawn Round Tubing, ASTM B75, 1/2" OD, 0.436" ID, 0.032" Wall, 96" Length@22.14@ABWHUEYK6JTPP@Robert Campbell@0/0@1.0@1339113600@Either 1 or 5 Stars. Depends on how you look at it.

B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A19Y7ZIICAKM48@T Foley "computer guy"@3/3@5.0@1248307200@I recommend this Sling Psychrometer

B000LDNH8I@Bacharach 0012-7012 Sling Psychrometer, 25?F to 120?F, red spirit filled@84.99@A3683PMJPFMAAS@Spencer L. Cullen@1/1@5.0@1335398400@A very useful tool

我现在有一个用“@”分隔的平面文件,我可以轻松导入到访问权限中。

抱歉絮絮叨叨。我愿意接受建议,但是不懂编程就能用编辑器语言编写。提前致谢

2 个答案:

答案 0 :(得分:0)

我不确定我是否理解如何将文本文件映射到数据库字段。

这是你需要决定的第一件事。

完成后,我建议将您的文本文件放入与数据库列对应的列中。然后你应该能够将它导入Access。

答案 1 :(得分:0)

这是我刚刚放在一起的一种方法,它不能保证。它读取数据(您提供的样本),并根据需要以正确的格式显示。

Public Sub ReadFileAndSave(filePath As String, breakIdentity As String, Optional sepStr As String = "@")
'******************************************************************************
'   Opens a large TXT File, reads the data until EOF on the Source,
'       then reformats the data to be saved on the Destination
'   Arguments:
'   ``````````
'       1. The Source File Path - "C:\Users\SO\FileName.Txt" (or) D:\Data.txt
'       2. The element used to identify new row - "new row" (or) "-" (or) "sam"
'       3. (Optional) Separator - The separator, you wish to use. Defauls to '@'
'*******************************************************************************
    Dim newFilePath As String, strIn As String, tmpStr As String, lineCtr As Long

    'The Destination file is stored in the same drive with a suffix to the source file name
    newFilePath = Replace(filePath, ".txt", "-ReFormatted.txt")

    'Open the SOURCE file for Read.
    Open filePath For Input As #1

    'Open/Create the DESTINATION file for Write.
    Open newFilePath For Output As #2

    'Loop the SOURCE till the last line.
    Do While Not EOF(1)
        'Read one line at a time.
        Line Input #1, strIn
        'If it is a blank/empty line SKIP.
        If Len(strIn) > 1 Then
            lineCtr = lineCtr + 1
            'Create a String of the same ID.
            tmpStr = tmpStr & Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
            'If a new row needs to be inserted, the BREAK IDENTITY is analyzed.
            If InStr(strIn, breakIdentity) <> 0 And lineCtr > 1 Then
                'Once the new row is triggered, dump the line in the Destination.
                Print #2, Left(tmpStr, Len(tmpStr) - Len(Mid(strIn, InStr(strIn, ":") + 1)) - 1) & vbCrLf
                'Prepare the NEXT ROW
                tmpStr = Trim(Mid(strIn, InStr(strIn, ":") + 1)) & sepStr
            End If
        End If
    Loop

    'Print the last line
    Print #2, Left(tmpStr, Len(tmpStr) - 1) & vbCrLf

    'Close the files.
    Close #1
    Close #2
End Sub

同样,这段代码适用于我的系统,而且我还没有测试过大部分内容,所以它可能会慢一点。如果这对您有效,请告诉我。