...
코드 블럭 |
---|
|
# nvme list
Node Generic SN Model Namespace Usage Format FW Rev
------------- ----------- --------------- -------------------------- ---------- ----------------------- ---------------- --------
/dev/nvme0n1 /dev/ng0n1 S463NF0M905327F Samsung SSD 970 PRO 512GB 0x1 6.94 GB / 512.11 GB 512 B + 0 B 1B2QEXP7
/dev/nvme10n1 /dev/ng10n1 S4..........26 SAMSUNG MZQLB7T6HMLA-00007 0x1 30.86 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme11n1 /dev/ng11n1 S4..........19 SAMSUNG MZQLB7T6HMLA-00007 0x1 927.35 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme12n1 /dev/ng12n1 S4..........80 SAMSUNG MZQLB7T6HMLA-00007 0x1 30.90 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme13n1 /dev/ng13n1 S4..........79 SAMSUNG MZQLB7T6HMLA-00007 0x1 927.71 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme14n1 /dev/ng14n1 S4..........87 SAMSUNG MZQLB7T6HMLA-00007 0x1 38.29 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme15n1 /dev/ng15n1 S4..........83 SAMSUNG MZQLB7T6HMLA-00007 0x1 30.91 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme1n1 /dev/ng1n1 S4..........76 SAMSUNG MZQLB7T6HMLA-00007 0x1 1.07 MB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme2n1 /dev/ng2n1 S4..........73 SAMSUNG MZQLB7T6HMLA-00007 0x1 947.71 MB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme3n1 /dev/ng3n1 S4..........43 SAMSUNG MZQLB7T6HMLA-00007 0x1 26.84 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme4n1 /dev/ng4n1 S4..........90 SAMSUNG MZQLB7T6HMLA-00007 0x1 7.68 TB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme5n1 /dev/ng5n1 S4..........91 SAMSUNG MZQLB7T6HMLA-00007 0x1 61.12 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme6n1 /dev/ng6n1 S4..........92 SAMSUNG MZQLB7T6HMLA-00007 0x1 61.09 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme7n1 /dev/ng7n1 S4..........75 SAMSUNG MZQLB7T6HMLA-00007 0x1 908.11 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme8n1 /dev/ng8n1 S4..........82 SAMSUNG MZQLB7T6HMLA-00007 0x1 908.14 GB / 7.68 TB 512 B + 0 B EDB5502Q
/dev/nvme9n1 /dev/ng9n1 S4..........85 SAMSUNG MZQLB7T6HMLA-00007 0x1 7.68 TB / 7.68 TB 512 B + 0 B EDB5502Q |
커널 에러 로그
코드 블럭 |
---|
|
# cat /var/log/messages
Feb 24 11:57:41 stor1 kernel: md/raid:md125: device nvme14n1 operational as raid disk 0
Feb 24 11:57:41 stor1 kernel: md/raid:md125: device nvme4n1 operational as raid disk 2
...
Feb 24 11:57:49 stor1 kernel: nvme14n1: Read(0x2) @ LBA 3112952, 1024 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) MORE DNR
Feb 24 11:57:49 stor1 kernel: critical target error, dev nvme14n1, sector 3112952 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
...
Feb 24 12:24:34 stor1 kernel: nvme4n1: Read(0x2) @ LBA 642804224, 1024 blocks, Unrecovered Read Error (sct 0x2 / sc 0x81) MORE DNR
Feb 24 12:24:34 stor1 kernel: critical target error, dev nvme4n1, sector 642804224 op 0x0:(READ) flags 0x0 phys_seg 128 prio class 0
... |
...
코드 블럭 |
---|
/dev/nvme{i}n1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
unsafe_shutdowns 28 10 10 10 16 16 16 17 17 15 17 17 14 18 16 10
num_err_log_entries 35 168 168 168 196 191 191 181 181 181 191 181 186 181 267 168
critical_warning 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
media_errors 0 0 0 0 5 0 0 0 0 0 0 0 0 0 76 0 |
https://santander.co.kr/122
코드 블럭 |
---|
1. available_spare < available_spare_threshold 이 되면 위험 , spare 영역 어쩌구 저쩌구
2. percentage_used 100% 넘어가면 위험함, 밴더사별로 내놓은 워런티? 수명? 뭐 그런거임
3. controller_busy_time 분단위인데.... 바쁘게(I/O 큐가 밀려있을때) 움직인 시간... 대기작업이 많이 있는경우 올라가는거라 정상인것같다.(정확하지 않다.) 0인 서버 못찾음
4. unsafe_shutdowns 말그대로임, 서버 강종하지말자.
5. media_errors 는 1 되면 배드섹터 감지된거니까 교체해야함
nvme는 모니터링해야되는게
1. available_spare < available_spare_threshold
2. percentage_used > 100
3.media_errors > 0 |