自从把我的一个网站搬家到linode 1g 的vps上后,总感觉有些不太正常。首先dnspod监控上显示网站有时能访问,有时不行。然后wp supercache插件在执行预缓存任务时经常失败后重启,邮件内容如“[http://www.szl724.com]?预缓存可能已失去响应。预缓存已经重新启动。”。还有一个情况是linode有时会发来磁盘io使用高的报警邮件。
今天抽空检查了下服务器,发现一个现象是mysql进程频繁无故重启。
150424 17:41:14 [note] event scheduler: loaded 0 events150424 17:41:14 [note] /usr/local/mysql/bin/mysqld: ready for connections.version: \'5.5.37\' socket: \'/tmp/mysql.sock\' port: 3306 source distribution150424 19:27:16 mysqld_safe number of processes running now: 0150424 19:27:16 mysqld_safe mysqld restarted150424 19:27:24 [note] plugin \'innodb\' is disabled.150424 19:27:24 [note] server hostname (bind-address): \'0.0.0.0\'; port: 3306150424 19:27:24 [note] - \'0.0.0.0\' resolves to \'0.0.0.0\';150424 19:27:24 [note] server socket created on ip: \'0.0.0.0\'.150424 19:27:24 [warning] \'user\' entry \'root@li676-235\' ignored in --skip-name-resolve mode.150424 19:27:24 [warning] \'proxies_priv\' entry \'@ root@li676-235\' ignored in --skip-name-resolve mode.150424 19:27:25 [note] event scheduler: loaded 0 events150424 19:27:25 [note] /usr/local/mysql/bin/mysqld: ready for connections.version: \'5.5.37\' socket: \'/tmp/mysql.sock\' port: 3306 source distribution通过这个日志,mysql在提示他运行困难,也就是表示服务器资源不够用了,接下来开始检查。
the error log message “mysqld_safe number of processes running now: 0″ indicates scarcity for resources to pursue the operations.
运行free -m ,发现空闲内存还有很多,差不多才用一半。
[root@li676-235 ~]# free -m total used free shared buffers cachedmem: 990 903 87 0 114 351-/ buffers/cache: 436 554swap: 255 53 202为了确定服务器资源是否真的不够用了,方法是查看系统日志中是否有oom(out of memory) killer运行过,果然在日志中发现有很多类似日志。
[root@li676-235 var]# egrep -i \oom|kill|mysql\ /var/log/messages |moreapr 23 13:36:16 li676-235 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0apr 23 13:36:16 li676-235 kernel: mysqld cpuset=/ mems_allowed=0apr 23 13:36:16 li676-235 kernel: cpu: 0 pid: 16020 comm: mysqld not tainted 3.18.5-x86_64-linode52 #1apr 23 13:36:16 li676-235 kernel: [<ffffffff8112695f>] ? oom_kill_process 0x65/0x32fapr 23 13:36:16 li676-235 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj nameapr 23 13:36:16 li676-235 kernel: [12128] 0 12128 26564 1 12 71 0 mysqld_safeapr 23 13:36:16 li676-235 kernel: [12405] 501 12405 155926 2868 120 3676 0 mysqldapr 23 13:36:16 li676-235 kernel: out of memory: kill process 9703 (php-fpm) score 41 or sacrifice childapr 23 13:36:16 li676-235 kernel: killed process 9703 (php-fpm) total-vm:266976kb, anon-rss:38932kb, file-rss:0kbapr 23 13:36:23 li676-235 kernel: mysqld invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0apr 23 13:36:24 li676-235 kernel: mysqld cpuset=/ mems_allowed=0apr 23 13:36:24 li676-235 kernel: cpu: 0 pid: 12405 comm: mysqld not tainted 3.18.5-x86_64-linode52 #1apr 23 13:36:24 li676-235 kernel: [<ffffffff8112695f>] ? oom_kill_process 0x65/0x32fapr 23 13:36:24 li676-235 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj nameapr 23 13:36:24 li676-235 kernel: [12128] 0 12128 26564 1 12 71 0 mysqld_safeapr 23 13:36:24 li676-235 kernel: [12405] 501 12405 156056 2873 120 3676 0 mysqldapr 23 13:36:24 li676-235 kernel: out of memory: kill process 18168 (php-fpm) score 38 or sacrifice childapr 23 13:36:24 li676-235 kernel: killed process 18168 (php-fpm) total-vm:263724kb, anon-rss:24872kb, file-rss:0kb通过这个日志,可以清楚看到oom kill在“apr 23 13:36:16”被php-fpm触发,也就是php在那时被系统强制kill掉了。我设置php-fpm进程最大可以打开的数量是20。按照20*30 = 600m计算,php占用的资源最大可能会大于600m 。
“apr 23 13:36:16”时nginx日志如下图所示:
配合nginx日志,可以确定当时php进程被打开很多,从而导致系统资源不够用。
我的解决办法是减少pm.max_children的数值。当然这个值调小后肯定会牺牲网站的性能,不过我分析日志后发现正常情况网站是没那么高并发的,所以影响应该不大。
linux vps服务器内存不够用的问题暂时先调整到这,先观察几天。
参考连接:http://www.supportsages.com/blog/tag/mysqld_safe-number-of-processes-running-now-0/
云服务器升级-其他问题韩国免费云服务器租用我提交了证书续费麻烦帮我部署一下我需要租用阿里云服务器是什么2012谷歌透明度报告数据简析租用移动云服务器若您的域名注册商不支持顶级域名指向到别名新的营业执照暂时没在手里