网络蜘蛛速度受MySQL配置限制?

时间:2015-09-06 22:06:38

标签: mysql database relational-database

我最近在达到Comcast Business办公室网络的限制后,将我的网络蜘蛛转移到付费云类型托管情况。我正在经历一些奇怪的限制,希望有人能够阐明它。

我有10个小实例正在运行 - 每个实例都有1 GB内存和1个CPU - 带有SSd存储。

每个实例配置为每天访问大约100万个网站 - 每个实例在自行运行时都可以轻松实现此目的。

我有另一个实例正在处理一个简单的MySQL数据库以保持跟踪 - 这个数据库使用8 GB内存,4个内核和90 GB SSD。

如果我运行4个实例,我每天可以达到4百万个,如果我运行10个实例,我每天仍然只能达到4百万个 - 这个配置会让人感到窒息。

MySQL大部分时间以大约360%的CPU(4个核心)运行,大约70%的内存 - 典型的io大约是4 MB / s。

数据库写入由10个表组成,并不是每次访问都会导致写入每列 - 大约50%的访问导致2-10个表写入。唯一一个始终更新的表是“访问过的”表,其中包含上次访问日期/时间。

以下是我配置的一些摘录:

my.cnf:

    explicit_defaults_for_timestamp
connect_timeout = 60
sync_binlog = 0

innodb_buffer_pool_size = 5G
innodb_file_format=Barracuda
innodb_log_file_size = 10G
innodb_file_per_table=1
innodb_log_buffer_size=4M
innodb_flush_log_at_trx_commit=0
innodb_thread_concurrency=10
#transaction-isolation=READ-COMMITTED
max_connections = 2500
innodb_buffer_pool_instances = 5
innodb_io_capacity = 30000
innodb_read_io_threads = 10000
innodb_write_io_threads = 10000

innodb_doublewrite = 0
innodb_open_files = 10000
innodb_support_xa=0

innodb_flush_method = O_DIRECT

max_allowed_packet = 32M
thread_stack = 1M
sort_buffer_size = 256K

table_open_cache = 2000
thread_cache_size = 5000

这是一个典型的表 - 带索引&所以 - 我相信这些是相当拨打的,因为我可以很快地查询/排序/等。

CREATE TABLE `websites` (
  `wid` bigint(20) NOT NULL AUTO_INCREMENT,
  `host` varchar(100) NOT NULL,
  `status` int(3) NOT NULL DEFAULT '0',
  `total_time` int(15) NOT NULL DEFAULT '0',
  `total_data` int(15) NOT NULL DEFAULT '0',
  `hash` int(15) NOT NULL DEFAULT '0',
  `machine` int(2) NOT NULL DEFAULT '0',
  `ipv4` int(4) unsigned DEFAULT NULL,
  `ipv6` binary(16) DEFAULT NULL,
  PRIMARY KEY (`wid`),
  UNIQUE KEY `host` (`host`),
  KEY `status` (`status`),
  KEY `total_time` (`total_time`),
  KEY `total_data` (`total_data`),
  KEY `machine` (`machine`),
  KEY `ipv4` (`ipv4`,`ipv6`),
  KEY `hash` (`hash`) USING BTREE
) ENGINE=InnoDB AUTO_INCREMENT=3741662307 DEFAULT CHARSET=utf8 ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4

所有SELECT查询都不一样。

显示全球状态;

Aborted_clients
6728
Aborted_connects
135
Binlog_cache_disk_use
0
Binlog_cache_use
0
Binlog_stmt_cache_disk_use
0
Binlog_stmt_cache_use
0
Bytes_received
42547122115
Bytes_sent
36810202741
Com_admin_commands
0
Com_assign_to_keycache
0
Com_alter_db
0
Com_alter_db_upgrade
0
Com_alter_event
0
Com_alter_function
0
Com_alter_procedure
0
Com_alter_server
0
Com_alter_table
0
Com_alter_tablespace
0
Com_alter_user
0
Com_analyze
0
Com_begin
0
Com_binlog
0
Com_call_procedure
0
Com_change_db
167
Com_change_master
0
Com_change_repl_filter
0
Com_check
0
Com_checksum
0
Com_commit
0
Com_create_db
0
Com_create_event
0
Com_create_function
0
Com_create_index
0
Com_create_procedure
0
Com_create_server
0
Com_create_table
0
Com_create_trigger
0
Com_create_udf
0
Com_create_user
0
Com_create_view
0
Com_dealloc_sql
0
Com_delete
0
Com_delete_multi
0
Com_do
0
Com_drop_db
0
Com_drop_event
0
Com_drop_function
0
Com_drop_index
0
Com_drop_procedure
0
Com_drop_server
0
Com_drop_table
0
Com_drop_trigger
0
Com_drop_user
0
Com_drop_view
0
Com_empty_query
0
Com_execute_sql
0
Com_explain_other
0
Com_flush
0
Com_get_diagnostics
0
Com_grant
0
Com_ha_close
0
Com_ha_open
0
Com_ha_read
0
Com_help
0
Com_insert
261004903
Com_insert_select
0
Com_install_plugin
0
Com_kill
0
Com_load
0
Com_lock_tables
0
Com_optimize
0
Com_preload_keys
0
Com_prepare_sql
0
Com_purge
0
Com_purge_before_date
0
Com_release_savepoint
0
Com_rename_table
0
Com_rename_user
0
Com_repair
0
Com_replace
14
Com_replace_select
0
Com_reset
0
Com_resignal
0
Com_revoke
0
Com_revoke_all
0
Com_rollback
0
Com_rollback_to_savepoint
0
Com_savepoint
0
Com_select
225216121
Com_set_option
12406
Com_signal
0
Com_show_binlog_events
0
Com_show_binlogs
12
Com_show_charsets
0
Com_show_collations
0
Com_show_create_db
0
Com_show_create_event
0
Com_show_create_func
0
Com_show_create_proc
0
Com_show_create_table
47
Variable_name
Value

Com_show_create_trigger
0
Com_show_databases
0
Com_show_engine_logs
0
Com_show_engine_mutex
0
Com_show_engine_status
0
Com_show_events
0
Com_show_errors
0
Com_show_fields
585
Com_show_function_code
0
Com_show_function_status
0
Com_show_grants
4
Com_show_keys
46
Com_show_master_status
9
Com_show_open_tables
0
Com_show_plugins
0
Com_show_privileges
0
Com_show_procedure_code
0
Com_show_procedure_status
0
Com_show_processlist
26
Com_show_profile
0
Com_show_profiles
0
Com_show_relaylog_events
0
Com_show_slave_hosts
0
Com_show_slave_status
9
Com_show_status
2
Com_show_storage_engines
0
Com_show_table_status
0
Com_show_tables
32
Com_show_triggers
0
Com_show_variables
6462
Com_show_warnings
0
Com_show_create_user
0
Com_slave_start
0
Com_slave_stop
0
Com_group_replication_start
0
Com_group_replication_stop
0
Com_stmt_execute
0
Com_stmt_close
0
Com_stmt_fetch
0
Com_stmt_prepare
0
Com_stmt_reset
0
Com_stmt_send_long_data
0
Com_truncate
0
Com_uninstall_plugin
0
Com_unlock_tables
0
Com_update
3768683
Com_update_multi
0
Com_xa_commit
0
Com_xa_end
0
Com_xa_prepare
0
Com_xa_recover
0
Com_xa_rollback
0
Com_xa_start
0
Com_stmt_reprepare
0
Connection_errors_accept
0
Connection_errors_internal
0
Connection_errors_max_connections
0
Connection_errors_peer_address
0
Connection_errors_select
0
Connection_errors_tcpwrap
0
Connections
3781613
Created_tmp_disk_tables
900
Created_tmp_files
153
Created_tmp_tables
1714
Delayed_errors
0
Delayed_insert_threads
0
Delayed_writes
0
Flush_commands
1
Handler_commit
489978593
Handler_delete
0
Handler_discover
0
Handler_external_lock
979991125
Handler_mrr_init
0
Handler_prepare
0
Handler_read_first
1022
Handler_read_key
340529954
Handler_read_last
14
Handler_read_next
1325647372
Handler_read_prev
240
Handler_read_rnd
111545828
Handler_read_rnd_next
3254347
Handler_rollback
9905
Handler_savepoint
0
Handler_savepoint_rollback
0
Handler_update
3768685
Handler_write
261011792
Innodb_buffer_pool_dump_status
not started
Innodb_buffer_pool_load_status
Buffer pool(s) load completed at 150906  5:57:38
Innodb_buffer_pool_resize_status
not started
Innodb_buffer_pool_pages_data
909831
Innodb_buffer_pool_bytes_data
5310021632
Innodb_buffer_pool_pages_dirty
137841
Innodb_buffer_pool_bytes_dirty
586342400
Innodb_buffer_pool_pages_flushed
156711220
Innodb_buffer_pool_pages_free
2828
Innodb_buffer_pool_pages_misc
18446744073708966597
Innodb_buffer_pool_pages_total
327640
Innodb_buffer_pool_read_ahead_rnd
0
Innodb_buffer_pool_read_ahead
115535
Innodb_buffer_pool_read_ahead_evicted
0
Variable_name
Value

Innodb_buffer_pool_read_requests
17132988745
Innodb_buffer_pool_reads
43009838
Innodb_buffer_pool_wait_free
0
Innodb_buffer_pool_write_requests
3418426697
Innodb_data_fsyncs
748775
Innodb_data_pending_fsyncs
0
Innodb_data_pending_reads
0
Innodb_data_pending_writes
0
Innodb_data_read
182409891840
Innodb_data_reads
44358403
Innodb_data_writes
17867174
Innodb_data_written
224979945984
Innodb_dblwr_pages_written
0
Innodb_dblwr_writes
0
Innodb_log_waits
0
Innodb_log_write_requests
343630408
Innodb_log_writes
133630
Innodb_os_log_fsyncs
92883
Innodb_os_log_pending_fsyncs
0
Innodb_os_log_pending_writes
0
Innodb_os_log_written
149354686976
Innodb_page_size
16384
Innodb_pages_created
1520762
Innodb_pages_read
44358411
Innodb_pages_written
17719344
Innodb_row_lock_current_waits
0
Innodb_row_lock_time
125773
Innodb_row_lock_time_avg
41
Innodb_row_lock_time_max
51019
Innodb_row_lock_waits
3015
Innodb_rows_deleted
0
Innodb_rows_inserted
47941507
Innodb_rows_read
1442492244
Innodb_rows_updated
3768685
Innodb_num_open_files
42
Innodb_truncated_status_writes
0
Innodb_available_undo_logs
128
Key_blocks_not_flushed
0
Key_blocks_unused
6696
Key_blocks_used
2
Key_read_requests
376
Key_reads
6
Key_write_requests
0
Key_writes
0
Locked_connects
0
Max_execution_time_exceeded
0
Max_execution_time_set
0
Max_execution_time_set_failed
0
Max_used_connections
795
Max_used_connections_time
2015-09-06 21:13:06
Not_flushed_delayed_rows
0
Ongoing_anonymous_transaction_count
0
Open_files
32
Open_streams
0
Open_table_definitions
132
Open_tables
2000
Opened_files
384
Opened_table_definitions
132
Opened_tables
26470
Performance_schema_accounts_lost
0
Performance_schema_cond_classes_lost
0
Performance_schema_cond_instances_lost
0
Performance_schema_digest_lost
0
Performance_schema_file_classes_lost
0
Performance_schema_file_handles_lost
0
Performance_schema_file_instances_lost
0
Performance_schema_hosts_lost
0
Performance_schema_index_stat_lost
0
Performance_schema_locker_lost
0
Performance_schema_memory_classes_lost
0
Performance_schema_metadata_lock_lost
0
Performance_schema_mutex_classes_lost
0
Performance_schema_mutex_instances_lost
0
Performance_schema_nested_statement_lost
0
Performance_schema_prepared_statements_lost
0
Performance_schema_program_lost
0
Performance_schema_rwlock_classes_lost
0
Performance_schema_rwlock_instances_lost
0
Performance_schema_session_connect_attrs_lost
0
Performance_schema_socket_classes_lost
0
Performance_schema_socket_instances_lost
0
Performance_schema_stage_classes_lost
0
Performance_schema_statement_classes_lost
0
Performance_schema_table_handles_lost
0
Performance_schema_table_instances_lost
0
Performance_schema_table_lock_stat_lost
0
Performance_schema_thread_classes_lost
0
Performance_schema_thread_instances_lost
0
Performance_schema_users_lost
0
Prepared_stmt_count
0
Qcache_free_blocks
1
Qcache_free_memory
1031832
Qcache_hits
0
Qcache_inserts
0
Qcache_lowmem_prunes
0
Qcache_not_cached
225057348
Qcache_queries_in_cache
0
Qcache_total_blocks
1
Queries
493783842
Questions
493783908
Variable_name
Value

Select_full_join
56
Select_full_range_join
0
Select_range
158
Select_range_check
0
Select_scan
8023
Slave_open_temp_tables
0
Slow_launch_threads
0
Slow_queries
0
Sort_merge_passes
3948
Sort_range
49
Sort_rows
490845
Sort_scan
129
Ssl_accept_renegotiates
0
Ssl_accepts
0
Ssl_callback_cache_hits
0
Ssl_cipher
Ssl_cipher_list
Ssl_client_connects
0
Ssl_connect_renegotiates
0
Ssl_ctx_verify_depth
0
Ssl_ctx_verify_mode
0
Ssl_default_timeout
0
Ssl_finished_accepts
0
Ssl_finished_connects
0
Ssl_server_not_after
Aug 29 23:50:49 2025 GMT
Ssl_server_not_before
Sep  1 23:50:49 2015 GMT
Ssl_session_cache_hits
0
Ssl_session_cache_misses
0
Ssl_session_cache_mode
Unknown
Ssl_session_cache_overflows
0
Ssl_session_cache_size
0
Ssl_session_cache_timeouts
0
Ssl_sessions_reused
0
Ssl_used_session_cache_entries
0
Ssl_verify_depth
0
Ssl_verify_mode
0
Ssl_version
Table_locks_immediate
6801
Table_locks_waited
0
Table_open_cache_hits
489970063
Table_open_cache_misses
26470
Table_open_cache_overflows
24463
Tc_log_max_pages_used
0
Tc_log_page_size
0
Tc_log_page_waits
0
Threads_cached
423
Threads_connected
370
Threads_created
795
Threads_running
310
Uptime
57104
Uptime_since_flush_status
57104

的iostat

Linux 3.16.0-4-amd64 (vultr.guest)      09/06/2015      _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          46.63    0.00    8.83    1.86    0.31   42.37

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
vda            1149.87      4598.30      3737.19  403966815  328316761

运行debian Jessie,MySQL 5.7,并使用来自vultr.com的服务

1 个答案:

答案 0 :(得分:0)

好吧,经过一夜的调整和放松。看着这个奇妙的工具 - innotop - 我在INSERT IGNORE INTO网站host = somehost.com更新的网站表上进行了调整 - 这是一个相当不复杂的查询,运行速度非常快。但是在我的情况下,数据库中有5亿个域名,而w3.org等一些非常常见的域名一直出现,并且不断被无条件地重新发送给母舰。我的解决方案是在每个节点上创建一个较小的仅主机数据库,其中所有节点都被看到了#34;存储主机 - 如果主机已被看到,它将永远不会再被发送到母舰。

现在,通常你会运行一个简单的perl哈希来跟踪它,但由于我的脚本使用Parallel :: Forkmanager来表示孩子,所以他们并不真正相互交谈。

目前我的吞吐量增加了一倍,并且在每个节点上使用整个域列表填写小主机表后,它会更快。我可以使用具有相同INSERT IGNORE查询的其他表来执行此操作,并进一步加快速度 - 但是现在我每天都会达到我的目标并且会满足于此...