种子,用于控制Matlab中随机函数的顺序

时间:2019-02-13 18:00:39

标签: matlab random-seed

我使用Matlab kmeans函数对两个数据集进行了聚类:data1和data2。 我有三个主要文件,分别包含以下代码,

result1 = kmeans(data1, 4);
result2 = kmeans(data2, 4);

r1 = kmeans(data1,4);

r2 = kmeans(data2,4);

我注意到result1和r1相同,但是result2和r2略有不同。我相信这是由kmeans算法中的随机性引起的。在第一个和第二个文件中,首先执行data1,因此kmeans使用相同的“种子”。在第一个和第三个文件中,data2在不同的阶段执行。用于result1的kmeans对以下kmeans有影响。

我的问题是:我们可以以某种方式设置种子以使r2和result2相同吗?

2 个答案:

答案 0 :(得分:2)

您可以使用rng函数在MATLAB中控制随机数的生成。使用它,您可以在运行代码之前捕获随机数生成器的状态,然后在再次运行随机数生成器之前将其设置回该状态,以确保获得相同的结果。例如:

rngState1 = rng;  % Capture state before processing data1
result1 = kmeans(data1, 4);
rngState2 = rng;  % Capture state before processing data2
result2 = kmeans(data2, 4);

...

rng(rngState1);  % Restore state previously used for processing data1
r1 = kmeans(data1,4);

...

rng(rngState2);  % Restore state previously used for processing data2
r2 = kmeans(data2,4);

由于您要在单独的文件中处理数据,因此这可能意味着saving and loading到MAT文件的状态变量和从MAT文件来的状态变量可以完成我上面概述的操作。另一个选择是简单地在处理每个数据集之前将种子设置为给定值:

rng(1);  % Set seed to 1 for data1
result1 = kmeans(data1, 4);
rng(2);  % Set seed to 2 for data2
result2 = kmeans(data2, 4);

...

rng(1);
r1 = kmeans(data1,4);

...

rng(2);
r2 = kmeans(data2,4);

答案 1 :(得分:0)

另一种选择是使用非随机初始化:

rawData = open("full_LOTR_1.txt").read()
cleaning1 = rawData.replace("\x92", "")
cleaning2 = cleaning1.replace("\n", "")
cleaning3 = cleaning2.replace("\\", "")
cleaning4 = reg.sub(r"""["?,$!;.]|['’](?!(?<! ')[tslm])""", " ", cleaning3)
cleaning5 = cleaning4.replace(" 128d ", "")
cleaning6 = cleaning5.lower()
cleaning7 = cleaning6.replace("o/","")
cleaning8 = " ".join(cleaning7.split())
cleaning9 = cleaning8.split()

scounter = 0
for char in cleaning9:
if (char == "sauron"):
    scounter = scounter + 1

print("Sauron is written " + str(scounter) + " times in 'The Fellowship of the Ring'")


fcounter = 0
for char in cleaning9:
if (char == "frodo"):
    fcounter = fcounter + 1

print("Frodo is written " + str(fcounter) + " times in 'The Fellowship of the Ring'")

不要复制粘贴上面的代码,这只是出于说明目的。但是您可能有一个很好的策略来非随机地初始化您的均值,这取决于您的数据如何实现。例如,对于矩形域内的2D数据,您可以选择域的四个角。