我的想法是用字符串填充字典作为键,字符串作为值填充。 其中x =生成在目录中的某个文件的base64EncodedString,y =这表示文件名以及扩展名,fileName.extension
我正在遍历本地目录中的一堆文件,并获取base64string以及每个文件的filename.extension。 在这个过程中,我检查给定键的值是否已经存在,如果它确实存在,我不添加键值对,否则我填充字典。
以下是我的代码,我是初学者,我正在努力解决这个问题。
Dictionary<string, string> d = new Dictionary<string, string>();
string[] attachmentPaths = Directory.GetFiles("someLocalFilePathPopulatedWithFiles");
Byte[] attachmentBytes;
string base64EncodedString;
string attachmentFileName;
foreach (string attachment in attachmentPaths)
{
//Base 64 conversion process
attachmentBytes = File.ReadAllBytes(attachment);
base64EncodedString = Convert.ToBase64String(attachmentBytes);
attachmentFileName = Path.GetFileName(attachment);
if (d.TryGetValue(base64EncodedString, out attachmentFileName))
{
Console.WriteLine("exists");
//trying to get a value for a key that does not exist, on the first iteration, then the compiler jumps to the else{}
}
else
{
Console.WriteLine("!exists");
//Since the <key, value> does not exist, go ahead and populate the dictionary
d.Add(base64EncodedString, attachmentFileName);
}
}
//Print out the key value pair.
//The value is not being printed.
foreach (KeyValuePair<string, string> pair in d)
{
Console.WriteLine("Key: " + pair.Key + " " + "Value: " + pair.Value);
}
问题是该值未被打印。
我还尝试使用更简单的代码来测试逻辑,它似乎工作。这似乎工作,因为我能够用我想要的数据填充字典对象,并在过程中检查它是否已经存在。
Dictionary<string, string> d = new Dictionary<string, string>();
d.Add("B", "fileB");
d.Add("C", "fileC");
d.Add("D", "fileD");
////we have an empty dicitonary, so lets try to get the value of key that IS not part of dictionary
string val = "fileA";
if (d.TryGetValue("A", out val))
{
Console.WriteLine("exists");
//do not add a key, since the <key ,value> exists
//so the compiler will always jump to the else, {adding a <key, value>}
}
else
{
Console.WriteLine("!exists");
d.Add("A", "fileA");
}
foreach (KeyValuePair<string, string> pair in d)
{
Console.WriteLine("Key: " + pair.Key + " " + "Value: " + pair.Value);
}
我确信存在差异,但我似乎没有抓住它,因为我仍在努力学习。
答案 0 :(得分:1)
d.TryGetValue(base64EncodedString,out attachmentFileName))。如果attachmentFileName不存在,该行将清除attachmentFileName,因此在您的else部分中,您只添加null。你需要在这里使用另一个变量来测试它是否存在。 使用一些临时变量来检查是否存在
foreach (string attachment in attachmentPaths)
{
//Base 64 conversion process
attachmentBytes = File.ReadAllBytes(attachment);
base64EncodedString = Convert.ToBase64String(attachmentBytes);
attachmentFileName = Path.GetFileName(attachment);
var filename=string.Empty;
if (d.TryGetValue(base64EncodedString, out filename))
{
Console.WriteLine("exists");
//trying to get a value for a key that does not exist, on the first iteration, then the compiler jumps to the else{}
}
else
{
Console.WriteLine("!exists");
//Since the <key, value> does not exist, go ahead and populate the dictionary
d.Add(base64EncodedString, attachmentFileName);
}
}
//Print out the key value pair.
//The value is not being printed.
foreach (KeyValuePair<string, string> pair in d)
{
Console.WriteLine("Key: " + pair.Key + " " + "Value: " + pair.Value);
}
答案 1 :(得分:0)
编辑:在OP评论后完成重写。
如果重点是查找内容与其他文件内容重复的文件,那么@Thangadurai的回答应该足以让您立即解决问题。
我仍然建议您计算每个文件内容的摘要/哈希值,然后使用基本64位编码版本作为词典键。这样,在方法运行时,您不必将每个文件的全部内容存储在内存中。像这样的东西(只是非常快速地测试):
Dictionary<string, string> d = new Dictionary<string, string>();
string[] attachmentPaths = Directory.GetFiles("c:\\temp");
System.Security.Cryptography.SHA1Managed sha1 = new System.Security.Cryptography.SHA1Managed();
byte[] oneFileHash = null;
foreach (string attachment in attachmentPaths)
{
using (FileStream oneFileStream = File.OpenRead(attachment))
{
oneFileHash = sha1.ComputeHash(oneFileStream);
}
if (oneFileHash == null)
continue;
string base64EncodedHash = Convert.ToBase64String(oneFileHash);
string attachmentFileName = Path.GetFileName(attachment);
if (d.ContainsKey(base64EncodedHash))
{
Console.WriteLine("exists");
//trying to get a value for a key that does not exist, on the first iteration, then the compiler jumps to the else{}
}
else
{
Console.WriteLine("!exists");
//Since the <key, value> does not exist, go ahead and populate the dictionary
d.Add(base64EncodedHash, attachmentFileName);
}
}
foreach (KeyValuePair<string, string> pair in d)
{
Console.WriteLine("Key: " + pair.Key + " " + "Value: " + pair.Value);
}
如果我是你,我不会浪费任何关于ContainsKey
和TryGetValue
的相对表现的想法。它们都不是这里的瓶颈。如果你想检查字典是否已包含某个键,请继续使用最清楚地表明它正在为你做这个的方法。
注意:理论上有两个具有不同内容的文件最终会使用相同的哈希值。这种情况发生的可能性很小。这么小,你可以在实践中完全忽略它......