我最近写了一篇question求助于如何计算人口中每一对独特过敏症的发生次数。我得到的解决方案很棒,但我现在需要看看3+过敏的组合,并且使用Excel表格完成这一切将需要永远。
我决定编写一个VBA脚本来执行此操作,这对于成对很有用。它也更快,因为我回去并更改了源数据的格式,以便每个ExceptionID的相关AllergenID存储在一个以逗号分隔的字符串中。
我现在正在考虑升级到3D或更高的阵列,因为我们不知道我们可能需要多少维度(可能是10或15)我宁愿避免使用一系列{ {1}}或嵌套的Case
语句。
我的研究出现了this article,其中我收集的是我所要求的实际上是不可能的,但我想询问OP的陈述
我认为如果我可以在运行时将Redim语句构造为字符串并执行字符串,那么这是可行的。但这似乎不可能。
我基本上有同样的想法。下面的代码会生成类型不匹配错误,但是没有可能有效的变体吗?我们不能在If/Then
内传递其他功能(例如join
)吗?
ReDim
This article听起来就像我在Java中所做的那样,但我不会说任何爪哇语,所以我无法确定这与我想要达到的目标有多相似,或者如果有办法将此方法应用于VBA ...
====== UPDATE ============
以下是我正在使用的数据示例(在单独的列中,为了清晰起见,我添加了破折号)
ExceptionID - ExcAllergens
035 - 100380
076 - 100107,100392,100345,100596,100141,100151,100344
200 - 100123,100200
325 - 100381
354 - 100381,100123
355 - 100381,100123
360 - 100586
390 - 100151,100344,100345,100349
441 - 100380,100368
448-100021,100181,100345,100200,100344,100295
491 - 100381
499 - 100333
503 - 100333
507 - 100331,100346,100596,100345,100344,100269,100283
以下是过敏原定义表的摘录(过敏原键是我刚刚添加的,以便使用较小的数字,6位数字是我们数据库中使用的数字。)
AllergenKey - AllergenID - AllergenTag
01 - 100011 - AçaiBerry
02 - 100012 - 乙酸
03 - 100013 - 琼脂琼脂 04 - 100014 - 龙舌兰 05 - 100015 - 酒精
06 - 100016 - Allspice
07 - 100017 - 碳酸氢铵
08 - 100018 - 淀粉酶
09 - 100019 - 胭脂树 10 - 100020 - Apple
11 - 100021 - Apple,Raw
12 - 100022 - 杏子 13 - 100023 - 葛根
14 - 100025 - 抗坏血酸
15 - 100027 - 芦笋
16 - 100028 - 鳄梨
17 - 100029 - 细菌培养
18 - 100030 - 发酵粉
请注意,有6810个例外情况,包括1到51个单独的过敏症(平均约4或5个),以及451种不同的过敏原。以下是我对过敏原对的分析结果(当我说“过敏原”它还包括像素食者这样的饮食偏好时):
前10对 - 对数 - 过敏原1 - 过敏原2
1 - 245 - 乳制品 - 面筋
2 - 232 - 鸡蛋 - 坚果
3 - 190 - 乳制品 - 鸡蛋
4 - 173 - 面筋 - 燕麦
5 - 146 - 大豆(可能含有) - 大豆
6 - 141 - 乳制品 - 坚果
7 - 136 - 牛肉 - 猪肉
8 - 120 - 乳制品 - 大豆
9 - 114 - 芝麻(可能含有) - 坚果
10 - 111 - 素食1 - 猪肉
答案 0 :(得分:1)
我不会担心与您的中型数据集的最大可能组合。你无法做出所有可能的组合。您将拥有许多样本群体中不会出现的组合。不要尝试计算它们,然后计算出现次数。
相反,请完成示例填充,并在工作表“数组”上创建元组作为数据条目。我建议使用3位数的过敏原密钥作为标识符编号,并将元组中的数字组合成一个长整数(对于较大的数字可能需要十进制)。
我建议的方法是将元组组合为长的元素,以后可以很容易地分解。然后使用频率函数计算每个元组“数字”的出现次数。因此,如果有过敏原使用密钥:1,17,451 - 它们形成一个由1,017,451组成的长组(与451,17和1相同) - 我们确保任何元组都强制将最小键的顺序强制为最大键。所以最大三倍是449,450,451,最小值是1,002,003。请注意,您永远不会有3,002,001,因为它会复制1,002,003。
我玩的模块如下: 编辑 - 更好的代码
Option Explicit
Option Base 1
Public Function concID(paramArr() As Variant) As Variant
' this function takes an array of numbers and arranges the array into
' one long code number - with order of smallest to largest
' the code number generated has each individual array entry as a 3-digit component
Dim wsf As WorksheetFunction
Dim decExp As Integer
Dim i As Long, j As Long
Dim bigNum As Variant ' may need to cast to Decimal??
Set wsf = WorksheetFunction
'may use cDec if necessary here??
For i = 1 To UBound(paramArr)
'determine the position of the component by multiplying by a multiple of 10^3
decExp = 3 * (UBound(paramArr) - i)
bigNum = bigNum + wsf.Small(paramArr, i) * 10 ^ decExp
Next i
concID = bigNum
End Function
Public Sub runAllergen()
Dim ws As Worksheet
Dim dataRange As Range, tupleRange As Range, uniqueList As Range, freqRange As Range, r As Range
Dim i As Long, j As Long, counter As Long
Dim dataArray As Variant, arr As Variant, tempholder As Long
Dim bigArray(1 To 10 ^ 6, 1 To 1) As Variant ' the array which will hold all the generated combinations from the data
Dim tuple As Long
tuple = 3
'this will come in as a user input.
Set ws = Sheet1
Set dataRange = ws.Range("A2:A10001") 'I have 10k people in my dataset, and this is just the allergen data vector
Application.ScreenUpdating = False 'IMPORTANT for efficiency
tempholder = 1 'this is the array index which the next combi entry is to be put into bigArray
dataArray = dataRange.Value 'write entire worksheet column to internal array for efficiency
For i = 1 To UBound(dataArray)
'obtain array of allergen values in each data row to obtain tuples from
arr = Split(dataArray(i, 1), ",")
If UBound(arr) + 1 >= tuple Then
'give over the array of row data to make tuples from and write to bigArray
'return the next available index of bigArray to store data
tempholder = printCombinations(arr, tuple, bigArray(), tempholder)
End If
Next i
Set r = ws.Range("B2")
'write entire list of tuples from data population to worksheet for efficiency - MASSIVE performance boost
r.Resize(tempholder - 1, 1).Value = bigArray
'copy tuple output over to another column to remove duplicates and get unique list
Set tupleRange = ws.Range(r, r.End(xlDown))
tupleRange.Copy
Set r = ws.Range("D2")
r.PasteSpecial Paste:=xlPasteValues, Operation:=xlNone, SkipBlanks _
:=False, Transpose:=False
'remove duplicates from copied tuple output to get a unique list of codes to serve as bins in FREQUENCY function
ws.Range(r, r.End(xlDown)).RemoveDuplicates Columns:=1, Header:=xlNo
Set uniqueList = ws.Range(r, r.End(xlDown))
Application.CutCopyMode = False
'set the frquency output range which is always 1 more row than the bins array
Set freqRange = uniqueList.Offset(0, 1).Resize(uniqueList.Rows.Count + 1, 1)
'get the frequency of each tuple
freqRange.FormulaArray = "=FREQUENCY(R2C" & tupleRange.Column & ":R" & tupleRange.Rows.Count + 1 & _
"C" & tupleRange.Column & _
",R2C" & uniqueList.Column & ":R" & uniqueList.Rows.Count + 1 & "C" & uniqueList.Column & ")"
Application.ScreenUpdating = True
End Sub
Public Function printCombinations(pool As Variant, r As Long, printVector As Variant, tempPosition As Long) As Long
'this function writes the data row arrays as tuples/combis to the bigArray,
'and returns the next available index in bigArray
Dim i As Long, j As Long, n As Long
Dim tempholder() As Variant
Dim idx() As Long
ReDim tempholder(1 To r)
ReDim idx(1 To r)
n = UBound(pool) - LBound(pool) + 1
For i = 1 To r
idx(i) = i
Next i
Do
For j = 1 To r
tempholder(j) = CLng(pool(idx(j) - 1))
Next j
'we now have an array of size tuple from the row data, so construct our code number,
'and write to the next available index in bigArray
printVector(tempPosition, 1) = concID(tempholder)
tempPosition = tempPosition + 1
' Locate last non-max index
i = r
While (idx(i) = n - r + i)
i = i - 1
If i = 0 Then
'the algorithm has ended with the last index exhausted
'return the next available index of bigArray
printCombinations = tempPosition
Exit Function
End If
Wend
idx(i) = idx(i) + 1
For j = i + 1 To r
idx(j) = idx(i) + j - i
Next j
Loop
End Function
初始设置:
您还可以将频率范围复制粘贴到值等等....
答案 1 :(得分:0)
为了扩展我的评论,这里有一些修改过的代码,用于根据提供的N_tuple
变量使用数组数组。我很难想象一个不适合你的场景:
Sub testroutine()
Dim x As Integer, y As Integer 'just a counter
Dim ArrayTemp() As Variant
Dim PairCount() As Variant
Dim AllergenRef As Object 'Object to store a reference to each AllergenID using AllergenKey as key
Set AllergenRef = CreateObject("Scripting.Dictionary")
For x = 1 To 20
AllergenRef.Add x, (x * 10) + (2 ^ x) 'dummy data for my dictionary
Next x
Dim N_tuple As Integer
N_tuple = 5 'this value would be provided by a user form at runtime
'Now that you have your N_tuple, redim your paircount array
ReDim PairCount(1 To N_tuple)
'For each N_tuple, create an array and add it to the PairCount array
'Note that you could easily have a 2-dimensional array for a table of values as ArrayTemp
For x = 1 To N_tuple
ReDim ArrayTemp(1 To AllergenRef.Count)
PairCount(x) = ArrayTemp
Next x
'Now you have an array of arrays, which can be easily accessed.
'For example: PairCount(2)(3)
'Or if the subarrays are 2-dimensional: PairCount(4)(6, 12)
'This simply loops through the PairCount array and shows the ubound of its subarrays
For x = 1 To UBound(PairCount)
MsgBox UBound(PairCount(x))
Next x
End Sub