solaris 更换故障硬盘测试记录
-
背景
-
模拟初始环境
-
模拟一:盘符不变的情况下
-
offline 掉需要更换的磁盘
-
推出该盘
-
测试文件系统读写
-
推入新盘
-
online and replace
-
模拟二:盘符该变的情况下
-
offline 掉需要更换的磁盘
-
推出该盘
-
新加磁盘
-
使用 detach 分离故障盘
-
使用 attach 为oraclepool 添加新盘,重建mirror
背景
近日,在一次巡视中,发现运维的系统中,有台Solaris主机硬盘告警灯亮起,经排查是硬盘故障,需要更换,由于前期配置问题,zpool的autoreplace属性为off ,不支持自动替换,必须使用相关命令进行手动替换。特此将测试过程整理记录。
模拟初始环境
模拟4块本地硬盘,c2t0d0和c2t1d0 组成 rpool ; c2t2d0 和 c2t3d0 为组成oraclepool。
AVAILABLE DISK SELECTIONS:
0. c2t0d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@2,0
3. c2t3d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number):
创建测试环境
#1. 创建测试pool oraclepool
zpool create oraclepool mirror c2t2d0 c2t3d0
#2. 创建测试zfs文件系统 // 测试读写
zfs create -o mountpoint=/u01/app oraclepool/u01
#3. 验证
root@solaris11:~# zpool status -V oraclepool
pool: oraclepool
state: ONLINE
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oraclepool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
errors: No known data errors
root@solaris11:~#
root@solaris11:~#
root@solaris11:~# zfs list| grep oraclepool
oraclepool 879M 8.92G 31K /oraclepool
oraclepool/u01 879M 8.92G 879M /u01/app
root@solaris11:~#
模拟一:盘符不变的情况下
模拟推出盘后,再插入新盘,使用同一个控制器
-
这里的测试环境中,可以结合format(/pci@0,0/pci15ad,1976@10/sd@3,0) 和硬盘的高级设置来确认插槽和要推出的硬盘; 生产环境中已经有硬盘告警灯亮起(没有亮起可以使用命令点亮),结合系统上的硬盘状态可以确定要推出的硬盘
本测试中,这里我推出硬盘 c2t3d0
offline 掉需要更换的磁盘
root@solaris11:~# zpool offline oraclepool c2t3d0
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oraclepool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 OFFLINE 0 0 0
device details:
errors: No known data errors
推出该盘
/pci@0,0/pci15ad,1976@10/sd@3,0
现在在format 里面选中该盘会报错,以此来判断该盘已经被推出
root@solaris11:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@2,0
3. c2t3d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number): 3
selecting c2t3d0 <(null) cyl 0 alt 0 hd 0 sec 0>
[disk unformatted]
Error: can't open disk '/dev/rdsk/c2t3d0p0'.
AVAILABLE DRIVE TYPES:
0. Auto configure
1. VMware,-VMware Virtual S-1.0
2. other
Specify disk type (enter its number):
测试文件系统读写
测试正常。
推入新盘
-
在测试环境中,直接扫描盘是认不到的,需要重启,大概率是跟虚机环境有关系。 devfsadm -Cc disk –扫描物理磁盘 推入新盘以后,执行format是不会报错的,由此验证,新盘更换完毕
// 这里在测试环境了,直接扫描盘是认不到的,需要重启,大概率是跟虚机环境有关系。
// devfsadm -Cc disk --扫描物理磁盘
// 推入新盘以后,执行format是不会报错的,由此验证,新盘更换完毕
root@solaris11:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@2,0
3. c2t3d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number): 3
selecting c2t3d0 <VMware,-VMware Virtual S-1.0 cyl 1669 alt 2 hd 224 sec 56>
[disk formatted]
No Solaris fdisk partition found.
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
fdisk - run the fdisk program
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
inquiry - show disk ID
volname - set 8-character volume name
!<cmd> - execute <cmd>, then return
quit
format>
online and replace
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oraclepool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 OFFLINE 0 0 0
device details:
errors: No known data errors
root@solaris11:~# zpool online oraclepool c2t3d0
warning: device 'c2t3d0' onlined, but remains in faulted state
use 'zpool clear' to restore a faulted device
这里直接可以看到online之后,系统会提示使用 zpool clear 还原故障设备,测试结果显示该命令不会改变池中硬盘成员的状态

而zpool status -v里也提示了可以使用 zpool replace 命令进行替换设备
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: DEGRADED
status: One or more devices are unavailable in response to persistent errors.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or 'fmadm repaired', or replace the device
with 'zpool replace'.
scan: none requested
config:
NAME STATE READ WRITE CKSUM
oraclepool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 UNAVAIL 0 0 0
device details:
c2t3d0 UNAVAIL cannot open
status: ZFS detected errors on this device.
The device was missing.
see: http://support.oracle.com/msg/ZFS-8000-LR for recovery
errors: No known data errors
因为是换盘该操作是换新盘,所以我认为使用replace更合适,
root@solaris11:~# zpool replace oraclepool c2t3d0
root@solaris11:~#
root@solaris11:~#
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: ONLINE
scan: resilvered 879M in 2s with 0 errors on Sun Feb 4 19:27:01 2024
config:
NAME STATE READ WRITE CKSUM
oraclepool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 ONLINE 0 0 0
errors: No known data errors
中间是有一个同步的状态,同步完以后,oraclepool池状态恢复正常,测试文件系统 oraclepool/u01读写,正常
模拟二:盘符该变的情况下
模拟盘符变了的情况,通过在vmware中修改新的插槽来实现
offline 掉需要更换的磁盘
root@solaris11:~# zpool offline oraclepool c2t3d0
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: DEGRADED
status: One or more devices has been taken offline by the administrator.
Sufficient replicas exist for the pool to continue functioning in a
degraded state.
action: Online the device using 'zpool online' or replace the device with
'zpool replace'.
scan: resilvered 879M in 2s with 0 errors on Sun Feb 4 19:27:01 2024
config:
NAME STATE READ WRITE CKSUM
oraclepool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t2d0 ONLINE 0 0 0
c2t3d0 OFFLINE 0 0 0
device details:
errors: No known data errors
推出该盘
# 推出故障磁盘后,系统中会报错
root@solaris11:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@2,0
3. c2t3d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@3,0
Specify disk (enter its number): 3
selecting c2t3d0 <(null) cyl 0 alt 0 hd 0 sec 0>
[disk unformatted]
Error: can't open disk '/dev/rdsk/c2t3d0p0'.
AVAILABLE DRIVE TYPES:
0. Auto configure
1. VMware,-VMware Virtual S-1.0
2. other
Specify disk type (enter its number):
新加磁盘
改变控制器(模拟改变插槽)重新扫盘,盘符变了
root@solaris11:~# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c2t0d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@0,0
1. c2t1d0 <VMware,-VMware Virtual S-1.0-50.00GB>
/pci@0,0/pci15ad,1976@10/sd@1,0
2. c2t2d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@2,0
3. c2t4d0 <VMware,-VMware Virtual S-1.0-10.00GB>
/pci@0,0/pci15ad,1976@10/sd@4,0
Specify disk (enter its number):
-
此时变为: 3. c2t4d0 <VMware,-VMware Virtual S-1.0-10.00GB> /pci@0,0/pci15ad,1976@10/sd@4,0
使用 detach 分离故障盘
因为盘符的改变,需要把oraclepool原来的故障硬盘分离出去
root@solaris11:~# zpool detach oraclepool c2t3d0
root@solaris11:~#
root@solaris11:~#
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: ONLINE
scan: resilvered 879M in 2s with 0 errors on Sun Feb 4 19:27:01 2024
config:
NAME STATE READ WRITE CKSUM
oraclepool ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
errors: No known data errors
使用 attach 为oraclepool 添加新盘,重建mirror

root@solaris11:~# zpool attach -f oraclepool c2t2d0 c2t4d0
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: DEGRADED
status: One or more devices are currently being resilvered. The pool will
continue to function in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sun Feb 4 19:49:28 2024
879M scanned
710M resilvered at 774M/s, 0.00% done, 1s to go
config:
NAME STATE READ WRITE CKSUM
oraclepool DEGRADED 0 0 0
mirror-0 DEGRADED 0 0 0
c2t2d0 ONLINE 0 0 0
c2t4d0 DEGRADED 0 0 0 (resilvering)
device details:
c2t4d0 DEGRADED scrub/resilver needed
status: ZFS detected errors on this device.
The device is missing some data that is recoverable.
errors: No known data errors
root@solaris11:~# zpool status -v oraclepool
pool: oraclepool
state: ONLINE
scan: resilvered 879M in 2s with 0 errors on Sun Feb 4 19:49:30 2024
config:
NAME STATE READ WRITE CKSUM
oraclepool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
c2t2d0 ONLINE 0 0 0
c2t4d0 ONLINE 0 0 0
errors: No known data errors
中间是有一个同步的状态,同步完以后,自动online
原文始发于微信公众号(运维小九九):Oracle Solaris 更换故障硬盘的测试记录
版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请发送邮件至 举报,一经查实,本站将立刻删除。
文章由极客之音整理,本文链接:https://www.bmabk.com/index.php/post/218408.html