Question

我有一个网络客户端和服务器应用程序。数据流使得客户端向服务器发送消息，服务器以确认响应。只有在收到确认后，客户端才会收到下一条消息。

用C ++编写的客户端应用程序有3个线程，即网络线程（负责通过套接字发送消息），主线程（负责发出请求消息）和计时器线程（每秒触发）。

服务器应用程序有2个线程，主线程和网络线程。

我运行RHEL 6.3,2.6.32-279内核。

配置1

tuned-adm profile latency-performance
所有客户端的线程在相同的CPU核心ID上
所有服务器的线程在相同的CPU核心ID上，但与客户端线程不同的核心ID
在同一台计算机上运行的客户端和服务器

吞吐量：每秒4500条消息

配置2

tuned-adm profile throughput-performance
所有客户端的线程在相同的CPU核心ID上
所有服务器的线程在相同的CPU核心ID上，但与客户端线程不同的核心ID
在同一台计算机上运行的客户端和服务器

吞吐量：每秒9-15条消息

配置3

tuned-adm profile throughput-performance
所有客户端在不同CPU核心ID上的线程
所有服务器的线程位于不同的CPU核心ID上，不同的核心ID来自客户端的线程
在同一台计算机上运行的客户端和服务器

吞吐量：每秒1100条消息

机器负载可忽略不计。有人可以解释当配置文件从延迟性能切换到吞吐量性能时，每秒4k到9个消息的丢失。

Answer 1

以下是RHEL tuned-adm配置文件之间差异的基本时间表：

延迟性能将I / O电梯切换到截止时间，并将CPU调控器更改为“性能”设置。

吞吐量性能针对网络和磁盘性能进行了优化。请参阅以下具体信息......

您的工作负载似乎对延迟敏感。

enter image description here

以下是throughput-performance w / comments的设置。 latency-performance不会修改其中的任何内容。

# ktune sysctl settings for rhel6 servers, maximizing i/o throughput
#
# Minimal preemption granularity for CPU-bound tasks:
# (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
kernel.sched_min_granularity_ns = 10000000

# SCHED_OTHER wake-up granularity.
# (default: 1 msec#  (1 + ilog(ncpus)), units: nanoseconds)
#
# This option delays the preemption effects of decoupled workloads
# and reduces their over-scheduling. Synchronous workloads will still
# have immediate wakeup/sleep latencies.
kernel.sched_wakeup_granularity_ns = 15000000

# If a workload mostly uses anonymous memory and it hits this limit, the entire
# working set is buffered for I/O, and any more write buffering would require
# swapping, so it's time to throttle writes until I/O can catch up.  Workloads
# that mostly use file mappings may be able to use even higher values.
#
vm.dirty_ratio = 40

通过tuned-adm更改，吞吐量从4k降至9

1 个答案: