抱歉。我删除了代码并编辑了帖子......
真正的问题是我试图找出两个文本或文件之间的相似程度或抄袭行为。我怎样才能做到这一点?如果你指导我......
我需要将上述算法的代码包含在我的项目中。
使用visual studio 2013 ... c#
EDITED: k到目前为止我已经完成了这个......
int i = 0;
int j = 0;
long lena1 = txtFile1.Text.Length;
long lenb1 = lena1;
long len2 = txtFile2.Text.Length;
string str1 = txtFile1.Text;
string str2 = txtFile2.Text;
string str3;
bool match = false;
int count = 0;
int nowords1 = 0;
int nowords2 = 0;
string str4;
int k = 0;
int m = 0;
int nowords_match = 0;
char[] array1 = str1.ToArray();
char[] array2 = str2.ToArray();
int[] loc1 = new int[1048576];
int[] loc2 = new int[1048576];
while (i < array1.Length)
{
if (array1[i] == ' ')
{
nowords1++;
loc1[j] = i;
j++;
}
i++;
}
i = j = 0;
while (i < array2.Length)
{
if (array2[i] == ' ')
{
nowords2++;
loc2[j] = i;
j++;
}
i++;
}
i = j = 0;
m = 0;
for (k = 0; k < loc1.Length-2; k++)
{
str3 = str1.Substring(loc1[m], loc1[m + 1] - loc1[m]);
match = true;
if (match == true && count > 3)
{
txtPlagiarism.Text += " " + loc1[i-3] + loc1[i-2] + " " + loc1[i];
}
else
{
count = 0;
match = false;
}
j = 0;
i = 0;
while (i < nowords2)
{
if (j != nowords2)
{
str4 = str2.Substring(loc2[j], loc2[j + 1] - (loc2[j]));
}
else
{
break;
}
if (str4.Equals(str3))
{
nowords_match++;
count ++;
}
j++;
i++;
}
m++;
}
我只是计算匹配的单词数,这样我就可以从first_file文本中选择那些单词到copy-case文本。 但我得到了一个运行时错误。
**System.ArgumentOutOfRangeException was unhandled
HResult=-2146233086
Message=Length cannot be less than zero.
Parameter name: length
Source=mscorlib
ParamName=length
StackTrace:
at System.String.InternalSubStringWithChecks(Int32 startIndex, Int32 length, Boolean fAlwaysCopy)
at System.String.Substring(Int32 startIndex, Int32 length)
at Calculate_File_Checksum.Form1.btnDetectPlagiairism_Click(Object sender, EventArgs e) in c:\Users\BLOOM\Documents\Visual Studio 2013\App2Test\Calculate_File_Checksum\Calculate_File_Checksum\Form1.cs:line 363
at System.Windows.Forms.Control.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnClick(EventArgs e)
at System.Windows.Forms.Button.OnMouseUp(MouseEventArgs mevent)
at System.Windows.Forms.Control.WmMouseUp(Message& m, MouseButtons button, Int32 clicks)
at System.Windows.Forms.Control.WndProc(Message& m)
at System.Windows.Forms.ButtonBase.WndProc(Message& m)
at System.Windows.Forms.Button.WndProc(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.OnMessage(Message& m)
at System.Windows.Forms.Control.ControlNativeWindow.WndProc(Message& m)
at System.Windows.Forms.NativeWindow.DebuggableCallback(IntPtr hWnd, Int32 msg, IntPtr wparam, IntPtr lparam)
at System.Windows.Forms.UnsafeNativeMethods.DispatchMessageW(MSG& msg)
at System.Windows.Forms.Application.ComponentManager.System.Windows.Forms.UnsafeNativeMethods.IMsoComponentManager.FPushMessageLoop(IntPtr dwComponentID, Int32 reason, Int32 pvLoopData)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoopInner(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.ThreadContext.RunMessageLoop(Int32 reason, ApplicationContext context)
at System.Windows.Forms.Application.Run(Form mainForm)
at Calculate_File_Checksum.Program.Main() in c:\Users\BLOOM\Documents\Visual Studio 2013\App2Test\Calculate_File_Checksum\Calculate_File_Checksum\Program.cs:line 19
at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
at System.AppDomain.ExecuteAssembly(String assemblyFile, Evidence assemblySecurity, String[] args)
at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
at System.Threading.ThreadHelper.ThreadStart_Context(Object state)
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
at System.Threading.ThreadHelper.ThreadStart()
InnerException:**
我不明白为什么会如此?因为我已经给出了正确的价值......请帮助任何人。
答案 0 :(得分:1)
有numerous种比较字符串相似性的方法。这是Martin为Levenshtein distance
组合的算法答案 1 :(得分:0)
在我的一个项目中,我必须检测一组对象的变化,并确定插入了哪些对象以及删除了哪些对象。可能这个算法可以用于你的任务。这是一些伪代码,你可以将它改编为C#。
最简单的方法是逐个字符地比较字符串。如果您发现它不能很好地工作,您可以尝试逐字逐行,逐行或逐段进行比较。
这个想法是:
请注意,逐个字符的搜索可能会导致检测到单个字符,结果可能会很奇怪。但无论如何,任意字符串的比较并不是一件容易的事。
ticker.names <- c('AAK.ST', 'ABB.ST', 'ALFA.ST')