附加具有相同类型数据但列名不同的CSV文件

时间:2016-06-02 23:02:14

标签: excel etl business-intelligence powerbi powerquery

希望你做得好。 我需要一些帮助。例如,我有3个csv文件:

1)File1.csv有2条记录/行

firstname | lastname | city | country | emailaddress
-----------------------------------------------------
alexf     sdfsd    mumbai india sdf@sdf.com
----------
asfd      sdfsdf   toronto canada dfsd@sdf.com

2)secondfile.csv,包含2条记录/行

first-name | last-name | currentcity | currentcountry | email-address
----------------------------------------------------------------------
asdf        sdfkjwl  sydney      australia      sdf@dsffwe.com
----------
lskjdf     sdlfkjlkj delhi       india           sdflkj@sdf.com

3)userfile.csv,包含2条记录/行

fname | lname | usercity | usercountry | email
-----------------------------------------------
sdf   sdflj auckland new zealand sdf@sdf.com
----------
sdfsdf sdf  venice   italy       sdf@dsf.com

现在我想创建一个单独的csv或excel或mysql或任何数据库表,其中我希望所有这些记录来自具有不同列/标题名称但具有相同类型数据的所有不同csv文件。像这样:

singlecsvfile.csv

first_name | last_name | city | country |     email_address
--------------------------------------------------------
alexf        sdfsd       mumbai   india       sdf@sdf.com
asfd         sdfsdf      toronto canada       dfsd@sdf.com
asdf        sdfkjwl       sydney  australia   sdf@dsffwe.com
lskjdf      sdlfkjlkj     delhi    india      sdflkj@sdf.com
sdf         sdflj         auckland new zealand sdf@sdf.com
sdfsdf      sdf           venice   italy       sdf@dsf.com

实际上,由于数据源的类型不同,我有50多个具有不同列名但具有相同类型数据的文件。你建议我做什么,你会建议什么策略或方式,我应该如何实现这一点。如果可能的话,请建议我简单的方法(excel / powerquery / powerBI)或代码(php / sql)。我需要快速或自动化的解决方案,如数据映射。我搜索了很多,但找不到任何解决方案。建议将不胜感激。感谢

2 个答案:

答案 0 :(得分:2)

我会使用Power Query。每个输入文件布局都需要一个单独的Query。这些只会重命名列以匹配您的 singlecsvfile.csv 列名称。我会将每个设置为加载到/仅创建连接

然后最终的 singlecsvfile 查询将使用附加查询来添加输入查询中的所有数据。 Power Query使用列名来组合Append中的数据 - 列的顺序(从左到右)无关紧要。

如果您的50多个文件中有任何常见布局,我会将它们分成子文件夹。然后,您可以使用单个输入查询翻录子文件夹中的所有文件 - 使用从文件/从文件夹开始

Power Query会将输出传递到Excel表格。如果您确实需要CSV输出,只需录制宏以刷新Power Query和另存为CSV。

答案 1 :(得分:-1)

SuperUser并不是一个真正的代码编写服务。话虽如此,我已经得到了一段基本上应该在vba中做你想要的代码。它有一些评论所以应该是可管理的。可能需要一些调整,具体取决于您的文件。

Option Explicit
Global first_sheet As Boolean
Global append As Boolean


Sub process_folder()
Dim book_counter As Integer
Dim folder_path As String
Dim pWB As Workbook, sWB As Workbook, sWB_name As String
Dim pWS As Worksheet

    book_counter = 0
    first_sheet = True

    'Flag between appending in one sheet and copying into individual sheets
    append = True

    Set pWB = ActiveWorkbook
    Set pWS = pWB.ActiveSheet

    folder_path = "O:\Active\_2010\1193\10-1193-0015 Kennecott eagle\Phase 8500 - DFN Modelling\4. Analysis & Modelling\Phase 2 - DFN building\Export\fracture_properties\20140205"

    folder_path = verify_folder(folder_path)
    If folder_path = "NULL" Then
        Exit Sub
    End If

    'Get first file to open
    sWB_name = Dir(folder_path, vbNormal)

    'Loop through files
    Do While sWB_name <> ""

        'Open each file
        Workbooks.Open Filename:=folder_path & sWB_name
        Set sWB = Workbooks(sWB_name)

        Call process_workbook(pWB, sWB)

        'close file increment counter
        sWB_name = Dir()
        book_counter = book_counter + 1
    Loop

    'Number of files processed
    MsgBox ("Number of Fragment Files processed: " & book_counter)


End Sub

Sub process_workbook(pWB As Workbook, sWB As Workbook)

       If append Then
        Call append_all_sheets(pWB, sWB, 1)
       Else
        Call copy_all_sheets(pWB, sWB)
       End If

End Sub

Sub copy_all_sheets(pWB As Workbook, sWB As Workbook)
Dim ws As Worksheet

    For Each ws In sWB.Worksheets
        ws.Move After:=pWB.Sheets(pWB.Sheets.count)
    Next ws

End Sub

Sub append_all_sheets(pWB As Workbook, sWB As Workbook, headerRows As Long)

Dim lastCol As Long, lastRow As Long, pasteRow As Long
Dim count As Integer
Dim ws As Worksheet

    For Each ws In sWB.Worksheets
        lastCol = find_last_col(ws)
        lastRow = find_last_row(ws)
        pasteRow = find_last_row(pWB.Sheets(1))


        'Copy entire data range if its the first sheet otherwise leave of the header row
        If first_sheet Then
         '   ws.Range("A1").Resize(lastRow, lastCol).Copy

            pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow, lastCol).Formula = ws.Range("A1").Resize(lastRow, lastCol).Formula
                'Destination:=pWB.Sheets(1).pasteRow
        Else
            'pWB.Sheets(1).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
            pWB.Sheets(1).Range("A" & pasteRow).Resize(lastRow - headerRows, lastCol).Formula = ws.Range("A1").Offset(headerRows, 0).Resize(lastRow - headerRows, lastCol).Formula
        End If

        first_sheet = False
    Next ws

    sWB.Close (False)

End Sub

Function find_last_row(ws As Worksheet) As Long

    With ws
        If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
            find_last_row = .Cells.Find(What:="*", _
                          After:=.Range("A1"), _
                          Lookat:=xlPart, _
                          LookIn:=xlFormulas, _
                        SearchOrder:=xlByRows, _
                        SearchDirection:=xlPrevious, _
                        MatchCase:=False).Row
        Else
          find_last_row = 1
        End If
    End With


End Function

Function find_last_col(ws As Worksheet) As Long

    With ws
        If Application.WorksheetFunction.CountA(.Cells) <> 0 Then
            find_last_col = .Cells.Find(What:="*", _
                          After:=.Range("A1"), _
                          Lookat:=xlPart, _
                          LookIn:=xlFormulas, _
                        SearchOrder:=xlByColumns, _
                        SearchDirection:=xlPrevious, _
                        MatchCase:=False).Column
        Else
          find_last_col = 1
        End If
    End With


End Function

Function verify_folder(path As String) As String

    If path = "" Then
        MsgBox ("Enter the Directory of the Fragment simulation files to process")
        verify_folder = "NULL"
        Exit Function
    End If

    If Not PathExists(path) Then
        MsgBox ("Directory does not exist")
        verify_folder = "NULL"
        Exit Function

    End If

    If Right(path, 1) <> "\" Then
            verify_folder = path & "\"
    End If

End Function

Function PathExists(pName) As Boolean
On Error Resume Next
    PathExists = (GetAttr(pName) And vbDirectory) = vbDirectory
End Function