为什么fscanf在多线程读取期间失败?

时间:2016-05-07 19:26:19

标签: c multithreading gdb valgrind

我是pthreads的新手。我正在运行一个线程,它对已排序文件中的一组随机密钥进行二进制搜索,然后读取密钥的值:

int binary_search_in_disk(int k_level, int key){

  if (!file_exist(level_fname)){
    return -1;
  }

  char *line = NULL;
  size_t len = 0;
  ssize_t read;

  int num, file_charsize, low_key, mid_key, high_key, value, op;

  // file position measured in long int bytes (4), point to the first char of the 
  // line
  int low, mid, high;

  FILE* level_file = fopen(level_fname, "r");
  printf("fname: %s ,", level_fname);

  // find low key 
  low = 0;
  printf("file exists? %d ,", file_exist(level_fname)); // return 1, file exists

  num = fscanf(level_file, "%d%d%d\n", &low_key, &value, &op); // seg fault, file does not exist
 ... 
}

我有另一个线程从同一个文件中读取键值对,然后销毁该文件并在其位置重命名另一个文件。我在执行读取和销毁的函数中添加了printf,并且printf显示当两个线程都在读取时发生了seg错误。 GDB还显示,在seg故障时,两个线程都从文件中读取。当我运行程序时,seg错误发生在不同的读取值。是什么导致level_file突然变为NULL?

GDB:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb73ffb40 (LWP 13213)]
__isoc99_fscanf (stream=0x0, format=0x804b641 "%d%d%d\n") at isoc99_fscanf.c:30
30  isoc99_fscanf.c: No such file or directory.
(gdb) bt
#0  __isoc99_fscanf (stream=0x0, format=0x804b641 "%d%d%d\n") at isoc99_fscanf.c:30
#1  0x0804ae60 in binary_search_in_disk (k_level=1, key=89) at lib.c:894
#2  0x0804ac04 in search (k_level=1, key=89) at lib.c:809
#3  0x080490da in get (key=89) at lsm.c:56
#4  0x08048dc9 in run_get (args=0x804e0c8) at concurrent_main.c:181
#5  0xb7f71f70 in start_thread (arg=0xb73ffb40) at pthread_create.c:312
#6  0xb7ea7bee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129
(gdb) thread apply all bt

Thread 3 (Thread 0xb73ffb40 (LWP 13213)):
#0  __isoc99_fscanf (stream=0x0, format=0x804b641 "%d%d%d\n") at isoc99_fscanf.c:30
#1  0x0804ae60 in binary_search_in_disk (k_level=1, key=89) at lib.c:894
#2  0x0804ac04 in search (k_level=1, key=89) at lib.c:809
#3  0x080490da in get (key=89) at lsm.c:56
#4  0x08048dc9 in run_get (args=0x804e0c8) at concurrent_main.c:181
#5  0xb7f71f70 in start_thread (arg=0xb73ffb40) at pthread_create.c:312
#6  0xb7ea7bee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129

Thread 2 (Thread 0xb7dbab40 (LWP 13212)):
#0  0xb7fdd428 in __kernel_vsyscall ()
#1  0xb7eb5151 in __lll_unlock_wake_private ()
    at ../nptl/sysdeps/unix/sysv/linux/i386/i686/../i486/lowlevellock.S:349
#2  0xb7e21809 in _L_unlock_156 () from /lib/i386-linux-gnu/libc.so.6
#3  0xb7e21768 in _IO_acquire_lock_fct (p=<synthetic pointer>) at libioP.h:905
#4  _IO_puts (str=0x804b738 "Read from disk ... ") at ioputs.c:37
#5  0x0804a39e in merge_in_memory_disk () at lib.c:556
#6  0x080497c7 in lsm_merge (k_level=0) at lib.c:162
#7  0x08049097 in put (key=256, value=256, op=0) at lsm.c:26
#8  0x08048cfa in run_put (args=0x804e028) at concurrent_main.c:128
#9  0xb7f71f70 in start_thread (arg=0xb7dbab40) at pthread_create.c:312
#10 0xb7ea7bee in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:129

Thread 1 (Thread 0xb7dbb700 (LWP 13208)):
#0  0xb7fdd428 in __kernel_vsyscall ()
#1  0xb7f73178 in pthread_join (threadid=3084626752, thread_return=0x0) at pthread_join.c:92
#2  0x08049029 in main (argc=6, argv=0xbfffefd4) at concurrent_main.c:270
(gdb) info threads
  Id   Target Id         Frame 
* 3    Thread 0xb73ffb40 (LWP 13213) "concurrent_main" __isoc99_fscanf (stream=0x0, 
    format=0x804b641 "%d%d%d\n") at isoc99_fscanf.c:30
  2    Thread 0xb7dbab40 (LWP 13212) "concurrent_main" 0xb7fdd428 in __kernel_vsyscall ()
  1    Thread 0xb7dbb700 (LWP 13208) "concurrent_main" 0xb7fdd428 in __kernel_vsyscall ()

Valgrind的:

==13748== Thread 3:
==13748== Invalid read of size 4
==13748==    at 0x40FE195: __isoc99_fscanf (isoc99_fscanf.c:30)
==13748==    by 0x804AE69: binary_search_in_disk (lib.c:892)
==13748==    by 0x804AC0D: search (lib.c:809)
==13748==    by 0x80490E3: get (lsm.c:56)
==13748==    by 0x8048DCD: run_get (concurrent_main.c:183)
==13748==    by 0x4092F6F: start_thread (pthread_create.c:312)
==13748==    by 0x4193BED: clone (clone.S:129)
==13748==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==13748== 
==13748== 
==13748== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==13748==  Access not within mapped region at address 0x0
==13748==    at 0x40FE195: __isoc99_fscanf (isoc99_fscanf.c:30)
==13748==    by 0x804AE69: binary_search_in_disk (lib.c:892)
==13748==    by 0x804AC0D: search (lib.c:809)
==13748==    by 0x80490E3: get (lsm.c:56)
==13748==    by 0x8048DCD: run_get (concurrent_main.c:183)
==13748==    by 0x4092F6F: start_thread (pthread_create.c:312)
==13748==    by 0x4193BED: clone (clone.S:129)
==13748==  If you believe this happened as a result of a stack
==13748==  overflow in your program's main thread (unlikely but
==13748==  possible), you can try to increase the size of the
==13748==  main thread stack using the --main-stacksize= flag.
==13748==  The main thread stack size used in this run was 8388608.
==13748== 
==13748== HEAP SUMMARY:
==13748==     in use at exit: 482,004 bytes in 2,046 blocks
==13748==   total heap usage: 2,679 allocs, 633 frees, 576,804 bytes allocated
==13748== 
==13748== LEAK SUMMARY:
==13748==    definitely lost: 122,280 bytes in 1,019 blocks
==13748==    indirectly lost: 0 bytes in 0 blocks
==13748==      possibly lost: 272 bytes in 2 blocks
==13748==    still reachable: 359,452 bytes in 1,025 blocks
==13748==         suppressed: 0 bytes in 0 blocks
==13748== Rerun with --leak-check=full to see details of leaked memory
==13748== 
==13748== For counts of detected and suppressed errors, rerun with: -v
==13748== Use --track-origins=yes to see where uninitialised values come from
==13748== ERROR SUMMARY: 188 errors from 6 contexts (suppressed: 0 from 0)
Killed

2 个答案:

答案 0 :(得分:3)

此:

FILE* level_file = fopen(level_fname, "r");
printf("fname: %s ,", level_fname);
printf("file exists? %d ,", file_exist(level_fname));

不是处理错误的正确方法:文件可能存在,但不可读,或者它现在可能存在但在调用fopen时不存在,或者您可能内存不足或存在文件描述符,或......

你应该做的是:

FILE *level_file = fopen(level_fname, "r");
if (level_file == NULL) {
  fprintf(stderr, "Unable to open '%s' for reading: %s\n", level_fname, sterror(errno));
  return -1;
}

如果你这样做了,错误很明显。

答案 1 :(得分:-1)

发现问题:Too many open files