Question

我有一个文本文件，例如＆＃34; file_with_email.txt＆＃34;其中包含以下电子邮件地址：

crp.edu
src.net
abc.edu

我需要在给定的文本文件中搜索电子邮件。我的代码的问题是，当我输入具有完整域名的电子邮件地址时，例如，如果我搜索例如，＆＃34; abc.edu＆＃34;然后它将返回消息＆＃34;发送电子邮件＆＃34;，这是正确的。

但是，如果我输入具有不完整或部分域名的电子邮件地址，例如＆＃34; abc.ed＆＃34;，作为输入，它不包含在给定文件中，它会输出相同的输出as＆＃34;发现电子邮件＆＃34;即使没有这样的电子邮件。

此外，在某些情况下，电子邮件会像＆＃34; abc.edu.net＆＃34;作为用户的输入。在这种情况下，我的代码打印出与＆＃34相同的输出;发现电子邮件＆＃34;它不包含在给定的文本文件中。我很感激任何帮助来解决这个问题。

以下是我在目前为止尝试过的文本文件中搜索电子邮件的功能：

int search_mail(char *email)
{
FILE *fp;
int line = 1;
int number_of_match = 0;
char temp[512];
char *fname = "/home/file_with_email.txt";
    if((fp=fopen(fname, "r"))==NULL)
    {
    return(-1);
    }

    while(fgets(temp, 512,fp) !=NULL)
    {
    fprintf(stdout, "Just read: %s\n", temp);
        if(strstr(temp, email) !=NULL)
        {
        printf("\n The match is found in the file\n ");
        //printf("\n %s \n", temp);
        number_of_match++;
        }
        //line++;
     }

     if(number_of_match == 0)
        printf("\n No result found");

        //close the file if it is open.

     if(fp)
         {
         fclose(fp);
     }
  }

Answer 1

在@ grek40的答案的基础上，您可以在符合POSIX的系统上mmap()搜索文件（请注意，我省略了相应的头文件和所有错误检查，以便尝试消除任何滚动条在代码窗格上）：

int startsWithWhitespace = 0;
int endsWithWhitespace = 0;
int fd = open( filename, O_RDONLY );
struct stat sb;
fstat( fd, &sb );
// size + 1 is needed to ensure the mapped file ends with at least one \0 character
char *data = mmap( NULL, sb.st_size + 1, PROT_READ, MAP_PRIVATE, fd, 0 );
close( fd );
char *match = strstr( data, string );
// found a potential match
if ( match )
{
    // check if it's the first char in the file else check the first character *before* the match
    if ( match == data )
    {
        startsWithWhiteSpace = 1;
    }
    else
    {
        startsWithWhitespace = isspace( *( match - 1 ) );
    }
    // get the character one past the end of the matched string
    char *end = match + strlen( string );
    // ensure that the end char is not \0 else the string is at the end of the file
    if ( *end )
    {
        endsWithWhitespace = isspace( *end );
    }
    else
    {
        endsWithWhitespace = 1;
    }
}
...

最后，如果match为非空，并且startsWithWhitespace和endsWithWhitespace都不为零，则匹配完整字符串。

编辑：为了彻底，您还需要检查前一个和下一个字符，这些字符串是您不认为是较长字符串的一部分的标点符号列表。

假设您要多次搜索文件，这是mmap()的完美用法。搜索文件的代码是 simple ，您可以将文件视为一个长字符串，而不用担心读取部分文件或如何检查字符串是否跨越两个连续的读取缓冲区。搜索一个巨大的文件可能比你真正调整IO操作时要慢，但是它很简单易用，它可能仍然是最好的方法。

Answer 2

您似乎基本上希望找到一个字符串，使其被空格包围或位于文本的开头/结尾。

因此，您需要的是在搜索文本开头之前跟踪角色以及可能结果的第一个字符的位置。然后，当您有可能的结果时，检查它是否位于开头或结尾，然后使用isspace(char)在开始前和结束后检查字符。如果第一个匹配是错误匹配（不包含空格），您还需要检查字符串中的真实匹配。

您当前的方法的另一个问题是，您冒险从前512个字符开始并以接下来的512个字符结束。目前你不会在这样的位置找到结果。

在文本文件中搜索电子邮件

2 个答案: