Pthread循环中的分段错误

时间:2016-04-06 19:51:05

标签: c pthreads

最近我一直在做一堆数值分析工作。主要针对相对简单概念的少量数据。对于即将到来的项目的预期,我开始研究更复杂的系统,并进行指数级的计算。我的运行时间从几十秒到几十分钟。为了加快运行时间,我决定学习如何使用pthreads编写代码。

因此,我一直致力于使用串行方法和pthread来填充矩阵的程序。我编写了这个程序,每次都执行n次,并计算每次运行的平均时间。当我使用单个pthread_t运行此程序时,它按预期工作。当我添加一个额外的线程时,我收到一个"分段错误"错误。

我的代码如下:

fill.h

#ifndef FILL_H_
#define FILL_H_

#include <pthread.h>//Allows access to pthreads                                  
#include <sys/time.h>//Allows the ability to pull the system time                
#include <stdio.h>//Allows input and output                                      
#include <stdlib.h>//Allows for several fundamental calls                        

#define NUM_THREADS 2                                                            
#define MAT_DIM 50                                                             
#define RUNS 1      

pthread_t threads[NUM_THREADS];                                              
pthread_mutex_t mutexmat;  

typedef struct{                                                                  
   int id;                                                                      
   int column;                                                                     
   int* matrix[NUM_THREADS];                                                                  
}WORKER;  
#endif

fill.c

/*This routine will fill an array both in serial and parallel with random
 *numbers. It will also display the real time it took to accomplish each task*/

/* C includes */
#include "fill.h"
/* Fills a matrix */
void fill(int start, int stop, int** matrix)
{
    int i, j;
    for(i = start; i < stop; i++)
    {
        for(j = 0; j < MAT_DIM; j++)
            matrix[i][j] = rand() % 10;
    }
}

void* work(void* threadarg)
{
    /* Creates a pointer to a worker type variable*/
    WORKER *this_worker;
    /* Points this_worker at what thread arg is pointing to*/
    this_worker = (WORKER*) threadarg;
    /* Calculates my stopping point for this thread*/
    int stop = this_worker-> column + (MAT_DIM / NUM_THREADS);
    /* Used to drive matrix */
    int i,j;
    /* Fills portion of Matrix */
    for( i = this_worker-> column; i < stop; i++)
    {
        /* Prints the column that matrix is working on */
        printf("Worker %d working on column %d\n", this_worker->id, i);

        for( j = 0; j < MAT_DIM; j++)
        {
            this_worker-> matrix[i][j] = rand() % 10;
        }
    }
    /* Signals thread is done */
    printf("Thread %d done.\n", this_worker-> id);
    /* Terminates thread */
    pthread_exit(NULL);
}

int main()
{
/* Seeding rand */
    srand (time(NULL)); 
/* These will be used for loops */
    int i, j, r, t; 
/* Creating my matrices */
    int* matrix_serial[MAT_DIM];
    int* matrix_thread[MAT_DIM];
/* creating timeval variables */
    struct timeval t0, t1;

/* Beginning serial solution */
    /* Creating timer for serial solution */
    gettimeofday(&t0, 0);
    /* Creating serial matrix */
    for(i = 0; i < MAT_DIM; i++)
        matrix_serial[i] = (int*)malloc( MAT_DIM * sizeof(int));

    /* Filling the matrix */    
    for(r = 0; r < RUNS; r++)
        fill(0, MAT_DIM, matrix_serial);
    /* Calculating how long it took to run */
    gettimeofday(&t1, 0);
    unsigned long long int delta_t = (t1.tv_sec * 1000000 + t1.tv_usec)
                                   - (t0.tv_sec * 1000000 + t0.tv_usec);
    double t_dbl = (double)delta_t/1000000.0;
    double dt_avg = t_dbl / (double)r;
    printf("\nSerial Run Time for %d runs: %f\t Average:%f\n",r, t_dbl, dt_avg);

/* Begin multithread solution */
    /* Creating the offset where each matrix will start */
    int offset = MAT_DIM / NUM_THREADS;
    /* Creating a variable to store a return code */
    int rc;
    /* Creates a WORKER type variable named mat_work_t */
    WORKER mat_work_t[NUM_THREADS];

    /* Allocating a chunk of memory for my matrix */
    for( i = 0; i < MAT_DIM; i++)
        matrix_thread[i] = (int*)malloc( MAT_DIM * sizeof(int));

    /* Begin main loop */
    for(r = 0; r < RUNS; r++)
    {
    /* Begin multithread population of matrix */    
        for(t = 0; t < NUM_THREADS; t++)
        {
    /* Sets the values for mat_work_t[t] */
            mat_work_t[t].id = t;
            mat_work_t[t].column = t * offset;
            /* Points the mat_work_t[t].matrix at the matrix_thread */
            for(i = 0; i < MAT_DIM; i++)
                mat_work_t[t].matrix[i] = &matrix_thread[i][0];

    /* Creates thread placing its return value into rc */
            rc = pthread_create(&threads[t],
                                NULL,
                                work,
                                (void*) &mat_work_t[t]);
    /* Prints that a thread was successfully created */ 
            printf("Thread %d created.\n", mat_work_t[t].id);
    /* Checks to see if a return code was sent. If it was it will print it. */
            if (rc) 
                {
                printf("ERROR: return code from pthread_create() is %d\n", rc);
                return(-1);
                }
        }
    /* Makes sure all threads are done doing work before progressing */
        printf("Waiting for workers to finish.\n");
        for(i = 0; i < NUM_THREADS; i++)
            pthread_join(threads[i], NULL);

        printf("Work complete!\n");

    }

    /* Prints out the last matrix that was created by the loop */
    for(i = 0; i < MAT_DIM; i++)
        {
            for(j = 0; j < MAT_DIM; j++)
                printf("%d ",matrix_thread[i][j]);
            printf("\n");
        }
    /* Terminates thread */
    pthread_exit(NULL);
}

当我运行gdb时,我得到:

[New Thread 0x7ffff7fd3700 (LWP 27907)]
Thread 0 created.
Worker 0 working on column 0
Worker 0 working on column 1
Worker 0 working on column 2

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff7fd3700 (LWP 27907)]
0x0000000000400924 in work (threadarg=0x7fffffffd9c0) at fill.c:35
35              this_worker-> matrix[i][j] = rand() % 10;

我对分段错误的理解是相当的教科书:当你试图访问不是你自己的内存时会出现分段错误。访问。由此我知道代码在访问存储该矩阵的存储器时遇到问题。

我的问题:

  1. 我的逻辑是否符合问题的本质?
  2. 为什么添加一个线程导致该程序崩溃?
  3. 我将来如何对这类事情进行故障排除(非常感谢任何提示)?
  4. 最后,如何修复它(线索或解决方案将不胜感激)?

1 个答案:

答案 0 :(得分:0)

你确定struct WORKER矩阵的大小只有NUM_THREADS吗?

您访问的位置超出了您在2个位置声明的数组的大小限制。

一个是 这里是主要功能 与MAT_DIM(50)相比,NUM_THREADS(仅为2)实际上太低了

for(i = 0; i < MAT_DIM; i++)
                mat_work_t[t].matrix[i] = &matrix_thread[i][0];

这里是工作函数

 for( i = this_worker-> column; i < stop; i++)
    {
        /* Prints the column that matrix is working on */
        printf("Worker %d working on column %d\n", this_worker->id, i);

        for( j = 0; j < MAT_DIM; j++)
        {
            this_worker-> matrix[i][j] = rand() % 10;
        }
    }

循环顺利,直到你访问矩阵[1] [j],当你试图访问矩阵[2] [j]时,你得到了分段错误,因为你已经将数组大小声明为2并且你正在尝试访问第三个(即矩阵[2] [j])