监控ceph

ceph管理里最常输入的命令可能是ceph health,它输出ceph的健康状态。

$ ceph health
HEALTH_OK

如果返回不是HEALTH_OK就要注意,可能有PG处于非active + clean状态。对于有问题的PG,还可进一步运行ceph health detail命令,输出它们的详情。

$ ceph health detail
HEALTH_OK

当然我这里没有问题PG存在,返回都是OK。

可以运行ceph -w命令监控集群的实时事件,包括INF(information)、WRN(warning)、ERR(error)事件。

$ ceph -w
cluster 963a6787-0043-48e2-8677-a70f1564be17
health HEALTH_OK
monmap e1: 1 mons at {ceph2=172.17.6.176:6789/0}, election epoch 1, quorum 0 ceph2
osdmap e64: 3 osds: 3 up, 3 in
pgmap v84557: 384 pgs, 3 pools, 4879 MB data, 1599 objects
120 GB used, 102 GB / 235 GB avail
384 active+clean

2015-12-03 10:20:14.713946 mon.0 [INF] pgmap v84557: 384 pgs: 384 active+clean; 4879 MB data, 120 GB used, 102 GB / 235 GB avail

集群的空间使用统计,可以运行ceph df命令。该命令显示总的空间大小、可用空间、已用空间、使用百分比。它还进一步按pool进行统计。

$ ceph df
GLOBAL:
SIZE AVAIL RAW USED %RAW USED
235G 102G 120G 51.34
POOLS:
NAME ID USED %USED MAX AVAIL OBJECTS
data 0 978 0 35073M 3
metadata 1 0 0 35073M 0
rbd 2 4879M 2.02 35073M 1596

运行如下命令检查集群状态:

$ ceph status
cluster 963a6787-0043-48e2-8677-a70f1564be17
health HEALTH_OK
monmap e1: 1 mons at {ceph2=172.17.6.176:6789/0}, election epoch 1, quorum 0 ceph2
osdmap e64: 3 osds: 3 up, 3 in
pgmap v84561: 384 pgs, 3 pools, 4879 MB data, 1599 objects
120 GB used, 102 GB / 235 GB avail
384 active+clean

它等同于ceph -s的输出。

运行如下命令获取集群的认证keys:

$ ceph auth list
installed auth entries:

...

检查MON的状态和map,输入如下命令:

$ ceph mon stat
e1: 1 mons at {ceph2=172.17.6.176:6789/0}, election epoch 1, quorum 0 ceph2

$ ceph mon dump
dumped monmap epoch 1
epoch 1
fsid 963a6787-0043-48e2-8677-a70f1564be17
last_changed 0.000000
created 0.000000
0: 172.17.6.176:6789/0 mon.ceph2

运行如下命令检查集群的仲裁状态,集群应该总是有超过51%的MON服务健康存在。

$ ceph quorum_status|python -mjson.tool
{
"election_epoch": 1,
"monmap": {
"created": "0.000000",
"epoch": 1,
"fsid": "963a6787-0043-48e2-8677-a70f1564be17",
"modified": "0.000000",
"mons": [
{
"addr": "172.17.6.176:6789/0",
"name": "ceph2",
"rank": 0
}
]
},
"quorum": [
0
],
"quorum_leader_name": "ceph2",
"quorum_names": [
"ceph2"
]
}

运行ceph osd tree检查OSD树:

$ ceph osd tree
# id weight type name up/down reweight
-1 0.24 root default
-2 0.24 host ceph2
0 0.07999 osd.0 up 1
1 0.07999 osd.1 up 1
2 0.07999 osd.2 up 1

它显示OSD的有用信息,比如权重、UP/DOWN状态、IN/OUT状态等。

ceph osd dump也是非常有用的命令,它输出OSD的map版本、pool细节,包括pool ID、名字、类型、CRUSH规则集、PG数等。它还输出每个OSD的ID、状态、权重等信息。

$ ceph osd dump
epoch 64
fsid 963a6787-0043-48e2-8677-a70f1564be17
created 2015-10-28 13:52:53.131559
modified 2015-11-30 09:58:21.147863
flags
pool 0 'data' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 43 flags hashpspool crash_replay_interval 45 stripe_width 0
snap 1 'snapshot01' 2015-11-05 11:41:13.296489
pool 1 'metadata' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 41 flags hashpspool stripe_width 0
pool 2 'rbd' replicated size 3 min_size 2 crush_ruleset 0 object_hash rjenkins pg_num 128 pgp_num 128 last_change 45 flags hashpspool stripe_width 0
removed_snaps [1~1]
max_osd 3
osd.0 up in weight 1 up_from 52 up_thru 61 down_at 51 last_clean_interval [49,50) 172.17.6.176:6811/59955 172.17.6.176:6812/59955 172.17.6.176:6813/59955 172.17.6.176:6814/59955 exists,up 3711263c-0898-4eac-aae1-3c316e8c6287
osd.1 up in weight 1 up_from 55 up_thru 61 down_at 54 last_clean_interval [8,50) 172.17.6.176:6805/59895 172.17.6.176:6806/59895 172.17.6.176:6807/59895 172.17.6.176:6808/59895 exists,up 6ef617b4-9dd0-4155-b2be-44bafa02f3d6
osd.2 up in weight 1 up_from 54 up_thru 61 down_at 53 last_clean_interval [22,50) 172.17.6.176:6800/59830 172.17.6.176:6801/59830 172.17.6.176:6802/59830 172.17.6.176:6803/59830 exists,up 117105d1-e101-498e-a837-eeb1b568716c

运行ceph osd crush dump检查CRUSH map:

$ ceph osd crush dump

它的输出很长,包括CRUSH的完整视图。

如果ceph集群里有数量众多的OSD,有时难以发现它们在CRUSH map里的位置。那么如下命令变得有用:

$ ceph osd find 1|python -mjson.tool
{
"crush_location": {
"host": "ceph2",
"root": "default"
},
"ip": "172.17.6.176:6805/59895",
"osd": 1
}

find后面参数是OSD的ID。

除了命令行监控外,还有一些开源的web面板类监控工具,包括Kraken、ceph-dash、Calamari等,这里不详述。

此条目发表在Common分类目录,贴了, 标签。将固定链接加入收藏夹。