我有CSV格式的超过100万个条目的大数据,其中包含我公司的用户信息。我已使用Recsv编辑器从文件中删除多余的列。现在我有以下专栏文章
ID NAME EMAIL SUB_STATUS SUB_DATE SMS_RECEIVED MEMBER
1 John abc@abc.com true 01.01.2018 true true
2 David abc@abc.com false 01.01.2018 true true
3 Raza abc@abc.com true 01.01.2018 true false
4 Syed abc@abc.com false 01.01.2018 false false
5 Eidi abc@abc.com true 01.01.2018 false false
我有超过100万条记录,但是我需要根据特定条件从中提取数据,例如,这里是示例逻辑
Extract all users which SUB_STATUS=true and SMS_RECEIVED=false and MEMBER=true OR
SUB_STATUS=false and SMS_RECEIVED=false and MEMBER=false
然后我可以根据上述示例条件在csv上获取输出。
如何存档?我是Windows用户,尝试使用PowerShell,Ressveditior。文件太大,无法在excel上打开。
答案 0 :(得分:1)
将大文件导入到Excel中没有问题,只需要拆分数据即可。拆分后,您可以应用过滤器。
问题只是时间。我将此宏用于5000万行CSV文件,并且可以正常工作。只需花一些时间即可复制。分隔符为“,”,请检查您的分隔符。
Sub ReadCSVFiles()
Dim i, j, k, l, m As Long
Dim UserFileName As String
Dim strTextLine As String
Dim iFile As Integer: iFile = FreeFile
Dim Word() As String
UserFileName = Application.GetOpenFilename
Open UserFileName For Input As #iFile
i = 1
j = 1
Check = False
Do Until EOF(1)
Line Input #1, strTextLine
If i >= 1048576 Then
i = 1
j = j + 1
Else
Sheets(1).Cells(i, j) = strTextLine
i = i + 1
End If
Loop
Close #iFile
Worksheets.Add
Set ws1 = ThisWorkbook.Worksheets(1)
Set ws2 = ThisWorkbook.Worksheets(2)
ws1Col = ws1.UsedRange.SpecialCells(xlCellTypeLastCell).Column
ws1Row = ws1.UsedRange.SpecialCells(xlCellTypeLastCell).Row
k = 0
l = 0
Dim Items(1 To 16384) As Integer
For i = 1 To ws1Col
For j = 1 To ws1Row
Length = UBound(Split(ws1.Cells(j, i).Value2, ",", , vbBinaryCompare))
'Change the separator here
If Length > k Then
k = Length
End If
For m = 0 To k
Word() = Split(ws1.Cells(j, i).Value2, ",", , vbBinaryCompare)
ws2.Cells(j, i + l + m).Value2 = Word(m)
'Change the separator here
Next
Next
If i = 1 Then
Items(i) = k
Else
Items(i) = k + Items(i - 1)
End If
k = 0
l = Items(i)
Next
End Sub
答案 1 :(得分:0)
您可以尝试 q 。该工具允许您直接在CSV上运行SQL查询,以提取数据的子集:https://harelba.github.io/q/
您还可以尝试使用Excel PowerPivot ,或MS Access!