我正在尝试优化将double
写入共享内存的性能。我有一个程序写入共享内存,而另一个程序从中读取。
我使用this post来帮助隔离运行这两个程序的CPU,在我的etc/default/grub
文件中包含以下行:
GRUB_CMDLINE_LINUX_DEFAULT="quiet splash intel_idle.max_cstate=1 isolcpus=6,7"
我正在使用taskset -c 6 writer
和taskset -c 7 reader
来设置这些程序在这些CPU上运行。
使用this man page on sched_setscheduler,我使用以下代码将两个程序都设置为具有最高的调度优先级:
struct sched_param param;
param.sched_priority = sched_get_priority_max(SCHED_FIFO);
if(sched_setscheduler(0, SCHED_FIFO, ¶m) == -1)
{
perror("sched_setscheduler failed");
exit(-1);
}
我已经定义了一个在共享内存中使用的结构,其中包含所需的同步工具,以及一个timespec结构和一个在两个程序之间传递的double,如下所示:
typedef struct
{
// Synchronization objects
pthread_mutex_t ipc_mutex;
sem_t ipc_sem;
// Shared data
double value;
volatile int read_cond;
volatile int end_cond;
double start_time;
struct timespec ts;
} shared_data_t;
共享内存初始化:
作家:
// ftok to generate unique key
key_t key = ftok("shmfile",65);
// shmget returns an identifier in shmid
int shmid = shmget(key,1024,0666|IPC_CREAT);
ftruncate(shmid, sizeof(shared_data_t));
// shmat to attach to shared memory
shared_data_t* sdata = (shared_data_t*) shmat(shmid,(void*)0,0);
sdata->value = 0;
阅读器:
// ftok to generate unique key
key_t key = ftok("shmfile",65);
// shmget returns an identifier in shmid
int shmid = shmget(key,1024,0666|IPC_CREAT);
ftruncate(shmid, sizeof(shared_data_t));
// shmat to attach to shared memory
shared_data_t* sdata = (shared_data_t*) shmat(shmid,(void*)0,0);
在Writer中初始化同步工具
pthread_mutexattr_t mutex_attr;
pthread_mutexattr_init(&mutex_attr);
pthread_mutexattr_setpshared(&mutex_attr, PTHREAD_PROCESS_SHARED);
pthread_mutex_init(&sdata->ipc_mutex, &mutex_attr);
sem_init(&sdata->ipc_sem, 1, 0);
写代码
for (int i = 0; i < 20000000; ++i)
{
pthread_mutex_lock(&sdata->ipc_mutex);
sdata->value++;
clock_gettime(CLOCK_MONOTONIC, &sdata->ts);
sdata->start_time = (BILLION*sdata->ts.tv_sec) + sdata->ts.tv_nsec;
sdata->read_cond = 1;
pthread_mutex_unlock(&sdata->ipc_mutex);
sem_wait(&sdata->ipc_sem);
}
fprintf(stderr, "done writing\n" );
pthread_mutex_lock(&sdata->ipc_mutex);
sdata->end_cond = 1;
pthread_mutex_unlock(&sdata->ipc_mutex);
阅读代码
double counter = 0;
double total_time = 0;
double max_time = 0;
double min_time = BILLION;
double max_thresh = 1000;
int above_max_counter = 0;
double last_val = 0;
while (1) {
pthread_mutex_lock(&sdata->ipc_mutex);
while (!sdata->read_cond && !sdata->end_cond) {
pthread_mutex_unlock(&sdata->ipc_mutex);
pthread_mutex_lock(&sdata->ipc_mutex);
}
clock_gettime(CLOCK_MONOTONIC, &sdata->ts);
double time_to_read = (BILLION*sdata->ts.tv_sec) + sdata->ts.tv_nsec - sdata->start_time;
if (sdata->end_cond) {
break;
}
if (sdata->value != last_val + 1) {
fprintf(stderr, "synchronization error: val: %g, last val: %g\n", sdata->value, last_val);
}
last_val = sdata->value;
if (time_to_read > max_time) {
max_time = time_to_read;
printf("max time: %lf, counter: %ld\n", max_time, (long int) counter);
}
if (time_to_read < min_time) min_time = time_to_read;
if (time_to_read > max_thresh) above_max_counter++;
total_time += time_to_read;
counter++;
sdata->read_cond = 0;
sem_post(&sdata->ipc_sem);
pthread_mutex_unlock(&sdata->ipc_mutex);
}
fprintf(stderr, "avg time to read: %g\n", total_time / counter);
fprintf(stderr, "max time to read: %g\n", max_time);
fprintf(stderr, "min time to read: %g\n", min_time);
fprintf(stderr, "count above max threshhold of %g ns: %d\n", max_thresh, above_max_counter);
在Writer中清理
//detach from shared memory
shmdt(sdata);
在Reader中清理
pthread_mutex_unlock(&sdata->ipc_mutex);
pthread_mutex_destroy(&sdata->ipc_mutex);
//detach from shared memory
shmdt(sdata);
// destroy the shared memory
shmctl(shmid,IPC_RMID,NULL);
目标是最大程度地减少这两个操作之间花费的时间。理想情况下,我希望能够保证从写入值开始到读取的时间少于1微秒。但是,我得到的输出是:
max time: 5852.000000, counter: 0
max time: 18769.000000, counter: 30839
max time: 27416.000000, counter: 66632
max time: 28668.000000, counter: 1820109
max time: 121362.000000, counter: 1853346
done writing
avg time to read: 277.959
max time to read: 121362
min time to read: 60
count above max threshhold of 1000 ns: 1871
表示在很多时候(读数的〜.01%)读数超过1 us,并且可以达到121us。
我的问题如下:
由于我将优先级设置为最高并隔离了运行这些程序的CPU,是什么原因导致这些峰值?
我从this post中学到,我不应该期望clock_gettime具有纳秒级的精度。这些尖峰仅仅是在clock_gettime中不准确吗?
我考虑的另一个选择是,尽管已将这些核心(6和7)设置为最高优先级,但它们还是以某种方式被中断。
任何帮助将不胜感激。
编辑
在下面的评论中,这是我的/proc/interrupts
文件的内容:
CPU0 CPU1 CPU2 CPU3 CPU4 CPU5 CPU6 CPU7
0: 20 0 0 0 0 0 0 0 IO-APIC 2-edge timer
1: 2 0 0 0 0 0 0 0 IO-APIC 1-edge i8042
8: 1 0 0 0 0 0 0 0 IO-APIC 8-edge rtc0
9: 0 0 0 0 0 0 0 0 IO-APIC 9-fasteoi acpi
12: 2 0 0 0 1 1 0 0 IO-APIC 12-edge i8042
16: 0 0 0 0 0 0 0 0 IO-APIC 16-fasteoi i801_smbus, pcim_das1602_16
19: 2 0 0 0 8 10 6 2 IO-APIC 19-fasteoi
120: 0 0 0 0 0 0 0 0 PCI-MSI 16384-edge aerdrv
121: 99 406 0 0 14 5960 6 0 PCI-MSI 327680-edge xhci_hcd
122: 8726 133 47 28 4126 3910 22638 795 PCI-MSI 376832-edge ahci[0000:00:17.0]
123: 2 0 0 0 2 0 3 3663 PCI-MSI 520192-edge eno1
124: 3411 0 2 1 176 24498 77 11 PCI-MSI 32768-edge i915
125: 45 0 0 0 3 6 0 0 PCI-MSI 360448-edge mei_me
126: 432 0 0 0 144 913 28 1 PCI-MSI 514048-edge snd_hda_intel:card0
NMI: 1 1 1 1 1 1 1 1 Non-maskable interrupts
LOC: 12702 10338 10247 10515 9969 10386 16658 13568 Local timer interrupts
SPU: 0 0 0 0 0 0 0 0 Spurious interrupts
PMI: 1 1 1 1 1 1 1 1 Performance monitoring interrupts
IWI: 0 0 0 0 0 0 0 0 IRQ work interrupts
RTR: 7 0 0 0 0 0 0 0 APIC ICR read retries
RES: 4060 2253 1026 708 595 846 887 751 Rescheduling interrupts
CAL: 11906 10423 11418 9894 14562 11000 21479 11223 Function call interrupts
TLB: 10620 8996 10060 8674 13172 9622 20121 9838 TLB shootdowns
TRM: 0 0 0 0 0 0 0 0 Thermal event interrupts
THR: 0 0 0 0 0 0 0 0 Threshold APIC interrupts
DFR: 0 0 0 0 0 0 0 0 Deferred Error APIC interrupts
MCE: 0 0 0 0 0 0 0 0 Machine check exceptions
MCP: 2 2 2 2 2 2 2 2 Machine check polls
ERR: 0
MIS: 0
PIN: 0 0 0 0 0 0 0 0 Posted-interrupt notification event
PIW: 0 0 0 0 0 0 0 0 Posted-interrupt wakeup event
我尝试将this post的中断122和123的smp相似性更改为核心0和1,这似乎无济于事,因为当我重置计算机时,这些相似性仍设置为核心6和7。
即使不重置并仅重新运行程序,我仍然看不到这些CPU内核所服务的中断数量的变化。