我有3组不同的数据(在不同的列中)
通过这3个数据集合,我希望获得5×1000×10,总共50k对应的元素。 E F G(每种动物与每种水果和每个国家相对应)。
可以通过手动复制和粘贴值来完成,但这需要很长时间。有没有办法通过VBA代码或
自动化它是否存在上述无限数据集的通用公式?如果不清楚,请告诉我。
以下是一个较小的数据示例以及结果的结果:
答案 0 :(得分:15)
我通过通用收集,您希望这可以容纳任意数量的列和每个列中的任意数量的条目。一些变体数组应该提供计算每个值的重复周期所必需的维度。
test4
将列标题标签放在第2行,从第A列开始,将数据直接放在第A行。
我添加了一些错误控件来警告超出工作表上的行数。这通常不是可能考虑的因素,但是将未确定数量的列中的值的数量相乘可以快速产生大量结果。你不会超过1,048,576行。
答案 1 :(得分:14)
非连接选择SQL语句的经典示例,它返回所列表的所有组合结果的笛卡尔积。
SQL数据库解决方案
只需将Animals,Fruit,Country作为单独的表导入任何SQL数据库,如MS Access,SQLite,MySQL等,并列出没有连接的列表,包括隐式(WHERE
)和显式(JOIN
)加入:
SELECT Animals.Animal, Fruits.Fruit, Countries.Country
FROM Animals, Countries, Fruits;
Excel解决方案
使用ODBC连接到包含动物,国家和水果范围的工作簿,在VBA中运行非连接SQL语句的相同概念。例如,每个数据分组都在自己的同名工作表中。
Sub CrossJoinQuery()
Dim conn As Object
Dim rst As Object
Dim sConn As String, strSQL As String
Set conn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
sConn = "Driver={Microsoft Excel Driver (*.xls, *.xlsx, *.xlsm, *.xlsb)};" _
& "DBQ=C:\Path To\Excel\Workbook.xlsx;"
conn.Open sConn
strSQL = "SELECT * FROM [Animals$A1:A3], [Fruits$A1:A3], [Countries$A1:A3] "
rst.Open strSQL, conn
Range("A1").CopyFromRecordset rst
rst.Close
conn.Close
Set rst = Nothing
Set conn = Nothing
End Sub
答案 2 :(得分:11)
我对此问题的第一种方法类似于@Jeeped发布的方法:
使用MicroTimer我计算了上述算法各部分的平均时间。对于更大的输入数据,第3部分占总执行时间的90%-93%。
以下是我尝试提高将数据写入工作表的速度。我已经定义了一个常量iMinRSize=17
。一旦可以使用相同的值填充超过iMinRSize
个连续行,代码就会停止填充数组并直接写入工作表范围。
Sub CrossJoin(rSrc As Range, rTrg As Range)
Dim vSrc() As Variant, vTrgPart() As Variant
Dim iLengths() As Long
Dim iCCnt As Integer, iRTrgCnt As Long, iRSrcCnt As Long
Dim i As Integer, j As Long, k As Long, l As Long
Dim iStep As Long
Const iMinRSize As Long = 17
Dim iArrLastC As Integer
On Error GoTo CleanUp
Application.ScreenUpdating = False
Application.EnableEvents = False
vSrc = rSrc.Value2
iCCnt = UBound(vSrc, 2)
iRSrcCnt = UBound(vSrc, 1)
iRTrgCnt = 1
iArrLastC = 1
ReDim iLengths(1 To iCCnt)
For i = 1 To iCCnt
j = iRSrcCnt
While (j > 0) And IsEmpty(vSrc(j, i))
j = j - 1
Wend
iLengths(i) = j
iRTrgCnt = iRTrgCnt * iLengths(i)
If (iRTrgCnt < iMinRSize) And (iArrLastC < iCCnt) Then iArrLastC = iArrLastC + 1
Next i
If (iRTrgCnt > 0) And (rTrg.row + iRTrgCnt - 1 <= rTrg.Parent.Rows.Count) Then
ReDim vTrgPart(1 To iRTrgCnt, 1 To iArrLastC)
iStep = 1
For i = 1 To iArrLastC
k = 0
For j = 1 To iRTrgCnt Step iStep
k = k + 1
If k > iLengths(i) Then k = 1
For l = j To j + iStep - 1
vTrgPart(l, i) = vSrc(k, i)
Next l
Next j
iStep = iStep * iLengths(i)
Next i
rTrg.Resize(iRTrgCnt, iArrLastC) = vTrgPart
For i = iArrLastC + 1 To iCCnt
k = 0
For j = 1 To iRTrgCnt Step iStep
k = k + 1
If k > iLengths(i) Then k = 1
rTrg.Resize(iStep).Offset(j - 1, i - 1).Value2 = vSrc(k, i)
Next j
iStep = iStep * iLengths(i)
Next i
End If
CleanUp:
Application.ScreenUpdating = True
Application.EnableEvents = False
End Sub
Sub test()
CrossJoin Range("a2:f10"), Range("k2")
End Sub
如果我们将iMinRSize
设置为Rows.Count
,则所有数据都会写入数组。以下是我的样本测试结果:
如果具有最高行数的输入列首先出现,则代码效果最佳,但修改代码以对列进行排序并按正确顺序处理不是一个大问题。
答案 3 :(得分:7)
您可以使用工作表公式执行此操作。 如果您有NAME'd范围 - 动物,水果和国家,那么“技巧”就是在该数组中生成索引以提供所有各种组合。
例如:
=CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)
将生成一系列基于1的数字,这些数字会在Fruits * Countries中的数字条目中重复出现 - 这样可以为每只动物提供所需的行数。
=MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1
将生成一个基于1的系列,该系列会针对多个国家/地区重复每个Fruit。
=MOD(ROWS($1:1)-1,ROWS(Countries))+1))
生成1..n的重复序列,其中n是国家/地区的数量。
将这些放入公式(带有一些错误检查)
D3: =IFERROR(INDEX(Animals,CEILING(ROWS($1:1)/(ROWS(Fruits)*ROWS(Countries)),1)),"")
E3: =IF(E3="","",INDEX(Fruits,MOD(CEILING(ROWS($1:1)/ROWS(Countries),1)-1,ROWS(Fruits))+1))
F3: =IF(E3="","",INDEX(Countries,MOD(ROWS($1:1)-1,ROWS(Countries))+1))
答案 4 :(得分:3)
实际上,我想修改我的旧答案。但是,我的新答案与旧答案完全不同。因为,旧答案是针对特定列的,这个是针对通用列的。在回答了旧的答案之后,提问者说了他想要普遍做到的新要求。对于固定列,我们可以认为固定循环和无限列,我们需要从另一种方式思考。所以,我也这样做。 SO用户也可以看到代码差异,我认为这对初学者很有帮助。
这个新代码并不像旧代码那么简单。如果你想清楚地了解代码,我建议逐行调试代码。
不要担心代码。我已经逐步测试了它。它对我来说非常有用。如果不适合你,请告诉我。有一件事是这段代码可能导致空行(没有数据)的错误。因为,目前我没有添加检查。
以下是我解决问题的通用方法:
Public Sub matchingCell()
Dim startRawColumn, endRawColumn, startResultColumn, endResultColumn, startRow As Integer
Dim index, row, column, containerIndex, tempIndex As Integer
Dim columnCount, totalCount, timesCount, matchingCount, tempCount As Integer
Dim isExist As Boolean
Dim arrayContainer() As Variant
'Actually, even it is for universal, we need to know start column and end column of raw data.
'And also start row. And start column for write result.
'I set them for my test data.
'You need to modify them(startRawColumn, endRawColumn, startRow, startResultColumn).
'Set the start column and end column for raw data
startRawColumn = 1
endRawColumn = 3
'Set the start row for read data and write data
startRow = 2
'Set the start column for result data
startResultColumn = 4
'Get no of raw data column
columnCount = endRawColumn - startRawColumn
'Set container index
containerIndex = 0
'Re-create array container for count of column
ReDim arrayContainer(0 To columnCount)
With Sheets("sheetname")
'Getting data from sheet
'Loop all column for getting data of each column
For column = startRawColumn To endRawColumn Step 1
'Create tempArray for column
Dim tempArray() As Variant
'Reset startRow
row = startRow
'Reset index
index = 0
'Here is one things. I looped until to blank.
'If you want anymore, you can modify the looping type.
'Don't do any changes to main body of looping.
'Loop until the cell is blank
Do While .Cells(row, column) <> ""
'Reset isExist flag
isExist = False
'Remove checking for no data
If index > 0 Then
'Loop previous data for duplicate checking
For tempIndex = 0 To index - 1 Step 1
'If found, set true to isExist and stop loop
If tempArray(tempIndex) = .Cells(row, column) Then
isExist = True
Exit For
End If
Next tempIndex
End If
'If there is no duplicate data, store data
If Not isExist Then
'Reset tempArray
ReDim Preserve tempArray(index)
tempArray(index) = .Cells(row, column)
'Increase index
index = index + 1
End If
'Increase row
row = row + 1
Loop
'Store column with data
arrayContainer(containerIndex) = tempArray
'Increase container index
containerIndex = containerIndex + 1
Next column
'Now, we got all data column including data which has no duplicate
'Show result data on sheet
'Getting the result row count
totalCount = 1
'Get result row count
For tempIndex = 0 To UBound(arrayContainer) Step 1
totalCount = totalCount * (UBound(arrayContainer(tempIndex)) + 1)
Next tempIndex
'Reset timesCount
timesCount = 1
'Get the last column for result
endResultColumn = startResultColumn + columnCount
'Loop array container
For containerIndex = UBound(arrayContainer) To 0 Step -1
'Getting the counts for looping
If containerIndex = UBound(arrayContainer) Then
duplicateCount = 1
timesCount = totalCount / (UBound(arrayContainer(containerIndex)) + 1)
Else
duplicateCount = duplicateCount * (UBound(arrayContainer(containerIndex + 1)) + 1)
timesCount = timesCount / (UBound(arrayContainer(containerIndex)) + 1)
End If
'Reset the start row
row = startRow
'Loop timesCount
For countIndex = 1 To timesCount Step 1
'Loop data array
For index = 0 To UBound(arrayContainer(containerIndex)) Step 1
'Loop duplicateCount
For tempIndex = 1 To duplicateCount Step 1
'Write data to cell
.Cells(row, endResultColumn) = arrayContainer(containerIndex)(index)
'Increase row
row = row + 1
Next tempIndex
Next index
Next countIndex
'Increase result column index
endResultColumn = endResultColumn - 1
Next containerIndex
End With
End Sub
答案 5 :(得分:2)
这是一个递归版本。它假定数据不包含任何内部选项卡,因为核心功能返回以制表符分隔的产品字符串。主子需要传递一个范围,该范围由数据和输出范围的左上角单元组成。这可能会稍微调整一下,但足以用于测试目的。
ColumnProducts Range("A:C"), Range("E1")
是解决OP问题的电话吗?这是代码:
'the following function takes a collection of arrays of strings
'and returns a variant array of tab-delimited strings which
'comprise the (tab-delimited) cartesian products of
'the arrays in the collection
Function CartesianProduct(ByVal Arrays As Collection) As Variant
Dim i As Long, j As Long, k As Long, m As Long, n As Long
Dim head As Variant
Dim tail As Variant
Dim product As Variant
If Arrays.Count = 1 Then
CartesianProduct = Arrays.Item(1)
Exit Function
Else
head = Arrays.Item(1)
Arrays.Remove 1
tail = CartesianProduct(Arrays)
m = UBound(head)
n = UBound(tail)
ReDim product(1 To m * n)
k = 1
For i = 1 To m
For j = 1 To n
product(k) = head(i) & vbTab & tail(j)
k = k + 1
Next j
Next i
CartesianProduct = product
End If
End Function
Sub ColumnProducts(data As Range, output As Range)
Dim Arrays As New Collection
Dim strings As Variant, product As Variant
Dim i As Long, j As Long, n As Long, numRows As Long
Dim col As Range, cell As Range
Dim outRange As Range
numRows = Range("A:A").Rows.Count
For Each col In data.Columns
n = col.EntireColumn.Cells(numRows).End(xlUp).Row
i = col.Cells(1).Row
ReDim strings(1 To n - i + 1)
For j = 1 To n - i + 1
strings(j) = col.Cells(i + j - 1)
Next j
Arrays.Add strings
Next col
product = CartesianProduct(Arrays)
n = UBound(product)
Set outRange = Range(output, output.Offset(n - 1))
outRange.Value = Application.WorksheetFunction.Transpose(product)
outRange.TextToColumns Destination:=output, DataType:=xlDelimited, Tab:=True
End Sub
答案 6 :(得分:1)
好的,所以你只想要一份所有可能组合的清单。这就是我要做的事情:
答案 7 :(得分:1)
这是我解决问题的方法。
Public Sub matchingCell()
Dim animalRow, fruitRow, countryRow, checkRow, resultRow As Long
Dim isExist As Boolean
'Set the start row
animalRow = 2
resultRow = 2
'Work with data sheet
With Sheets("sheetname")
'Loop until animals column is blank
Do While .Range("A" & animalRow) <> ""
'Set the start row
fruitRow = 2
'Loop until fruits column is blank
Do While .Range("B" & fruitRow) <> ""
'Set the start row
countryRow = 2
'Loop until country column is blank
Do While .Range("C" & countryRow) <> ""
'Set the start row
checkRow = 2
'Reset flag
isExist = False
'Checking for duplicate row
'Loop all result row until D is blank
Do While .Range("D" & checkRow) <> ""
'If duplicate row found
If .Range("D" & checkRow) = .Range("A" & animalRow) And _
.Range("E" & checkRow) = .Range("B" & fruitRow) And _
.Range("F" & checkRow) = .Range("C" & countryRow) Then
'Set true for exist flag
isExist = True
End If
checkRow = checkRow + 1
Loop
'If duplicate row not found
If Not isExist Then
.Range("D" & resultRow) = .Range("A" & animalRow)
.Range("E" & resultRow) = .Range("B" & fruitRow)
.Range("F" & resultRow) = .Range("C" & countryRow)
'Increase resultRow
resultRow = resultRow + 1
End If
'Increase countryRow
countryRow = countryRow + 1
Loop
'Increase fruitRow
fruitRow = fruitRow + 1
Loop
'Increase fruitRow
animalRow = animalRow + 1
Loop
End With
End Sub
我已经测试过了。它运作良好。祝你有愉快的一天。