我解决了那种算法,并且坚持了算法是如何工作的。
分离的按压算法是http://en.wikipedia.org/wiki/Dissociated_press
N gram - http://en.wikipedia.org/wiki/N-gram
可以使连续字符串中的随机字符串成为可能,因此可以实现。
实际上我不知道。如何终止。分离的印刷算法首先打印随机的n-gram。 然后它需要打印最后n-1个单词,并选择随机 以这些n-1个单词开头的n-gram。它会打印出最后一个单词 这个n-gram,并重复。所以输出的每个连续n个字 text是原始文本的n-gram。有时会发生这种情况 原始文本不包含以n-1个单词开头的n-gram 打印。在这种情况下,算法就会停止。
ngram(1,2)ngram(2,3)ngram(3,4)........ T T
对我来说是什么例子?我无法理解它的文字。
答案 0 :(得分:1)
好吧,首先你将测试分成n-gram:
分离的印刷算法首先打印随机的n-gram。
变为(对于n = 4)
等。然后,您从任何您喜欢的n-gram开始,并开始添加将完成到目前为止构建的文本的最后n-1
个单词的单词到已知的n-gram。因此,您创建的文本似乎几乎可读 - n
越大,您的文本就越可读。
答案 1 :(得分:0)
这不是一个非常复杂的算法。给定的版本运行得很好:
public class Dissociator {
// Required size of the overlap
int overlapSize = 8;
// Size of the fragment
int fragmentSize = overlapSize;
// The initial sequence to dissociate, characters or words (could also dissociate some other objects).
ArrayList<String> initial;
boolean space;
boolean wordMode;
Random r = new Random(System.currentTimeMillis());
// Dissociate the given string.
public String dissociate(String in) {
ArrayList<String> a;
if (wordMode)
a = wordBased(in);
else
a = charBased(in);
ArrayList<String> out = dissociate(a);
StringBuilder b = new StringBuilder(out.size());
for (String s : out) {
b.append(s);
if (wordMode)
b.append(' ');
}
return b.toString();
}
/**
* Run dissociation algorithm
*
* @param input the initial sequence
* @return the dissociated sequence.
*/
public ArrayList<String> dissociate(ArrayList<String> input) {
initial = input;
ArrayList<String> out = new ArrayList<String>();
while (out.size() < input.size()) {
int size = r.nextInt(overlapSize);
if (size == 0)
size = 1;
ArrayList<String> tail = getTailOf(out, size);
// Find random sequence in the input that matches the tail
int p = r.nextInt(input.size() - 1) + 1; // Avoid zero.
int was = p - 1; // This variable allows to break dissociation if it is not possible to find
// the acceptable continuation.
boolean ok = false;
if (tail.size() > 0)
do {
while (input.get(p).equals(tail.get(0)) && p != was)
p = (p + 1) % input.size();
for (int j = 1; j < tail.size(); j++)
if (j + p < input.size()) {
if (!tail.get(j).equals(input.get(j + p))) {
ok = false;
break;
}
}
ok = true;
} while (!ok && p != was);
for (int j = p; j < Math.min(p + fragmentSize, input.size()); j++)
out.add(input.get(j));
}
return out;
}
// Get the tail of the given size.
private ArrayList<String> getTailOf(ArrayList<String> out, int size) {
if (size >= out.size())
return out;
else {
ArrayList<String> r = new ArrayList<String>(size);
for (int p = out.size() - size; p < out.size(); p++) {
r.add(out.get(p));
}
return r;
}
}
private static ArrayList<String> charBased(String in) {
ArrayList<String> is = new ArrayList<String>();
for (int i = 0; i < in.length(); i++)
is.add(in.substring(i, i + 1));
return is;
}
private static ArrayList<String> wordBased(String in) {
ArrayList<String> is = new ArrayList<String>();
StringTokenizer st = new StringTokenizer(in, " ,:()?!\"'");
while (st.hasMoreTokens())
is.add(st.nextToken());
return is;
}
public static void main(String[] args) throws Exception {
String in;
File f = new File(args[0]);
BufferedReader r = new BufferedReader(new FileReader(f));
String sr;
StringBuilder bb = new StringBuilder((int) f.length());
while ((sr = r.readLine()) != null) {
bb.append(sr);
bb.append(' ');
}
in = bb.toString();
Dissociator d = new Dissociator();
String b = d.dissociate(in);
System.out.println(b);
}
}