Android Verified Boot 问题分析

一、问题处理思路

1.1 dm-verity问题处理流程

(1)回读整个system分区并与正确的分区进行比较。确认system.img中的数据是否已损坏?若是,则执行步骤2或步骤3。若不是,则执行步骤4。

(2)将mmc / ufs寄给供应商,以检查物理块是否损坏。

(3)擦除分区并下载软件。

(4)做更多的压力测试以验证system.img。如将数据从system复制到userdata分区以加快复制速度。若可以复现该问题,则删除损坏的物理块,请转到步骤3。若不能复现问题,则重复步骤4,或继续观察设备。如果复现问题并重复执行步骤3和步骤4超过 3次,且物理块始终相同,请返回到步骤2。如果物理块并不总是相同,则转到步骤5。

(5)检查dmesg日志,看是否有任何mmc驱动程序/UFS驱动程序故障。如果有,则向mmc/ufs team提交case协助分析。否则执行步骤6。

(6)mmc/ufs硬件/驱动程序基本被排除嫌疑,此时可以从DDR角度进行检查。在此设备上进行Qblizzard压力测试。如果测试结果良好,则执行步骤7。如果测试fail,则向DDR team提交case协助分析。

(7)观察损坏时,在第一次使设备崩溃后收集ramdump信息。添加以下debug信息:

1
2
3
4
5
static int verity_handle_err()
if (v->mode == DM_VERITY_MODE_LOGGING) return 0;
if (v->mode == DM_VERITY_MODE_RESTART)
-kernel_restart("dm-verity device corrupted");
+BUG_ON("dm-verity device corrupted");)

通过ramdump,可以从稳定性的角度进行一些完整性检查,例如任务列表遍历,与vmlinux进行只读区域比较,cache/ ddr比较,vma列表和rbtree比较等等。如果发现有问题,则在完成步骤6的情况下,转到步骤9。如果没有发现任何错误,转到步骤8。

(8)与硬件团队进行交换测试。

(9)尝试禁用CPR,提高APC电压,提高vdd-mx,以确保AP缓存稳定运行。

(10)如果烧录GSI时出现下方开机异常,则需要烧录禁用avb的vbmeta。

1
2
3
4
5
avb_slot_verify.c:432: DEBUG: Loading vbmeta struct from partition 'system'.
avb_footer.c:41: ERROR: Footer magic is incorrect.
avb_slot_verify.c:464: ERROR: system: Error validating footer.
Non Multi-slot: Unbootable entering fastboot mode
VB2: boot state: red(3)

参考指令如下:

1
fastboot --disable-verification flash vbmeta[_a/b] vbmeta.img

1.2 处理dm-verity错误

通过设计,哈希树验证错误是由HLOS而非引导程序检测到的。AVB提供了一种方法,该方法通过avb_slot_verify()函数中的hashtree_error_mode参数指定应如何处理错误。可能的值包括:

AVB_HASHTREE_ERROR_MODE_RESTART_AND_INVALIDATE表示HLOS将使当前插槽无效并重新启动。在具有A / B的设备上,这将导致尝试引导另一个插槽(如果标记为可引导),或者导致无法引导任何操作系统的模式(例如某种形式的修复模式)。在Linux中,这需要使用CONFIG_DM_VERITY_AVB构建的内核。

AVB_HASHTREE_ERROR_MODE_RESTART表示操作系统将在不使当前插槽无效的情况下重新启动。请谨慎使用此模式,因为如果每次启动都遇到相同的哈希树验证错误,则可能会导致启动循环。

AVB_HASHTREE_ERROR_MODE_EIO表示将向应用程序返回EIO错误。

AVB_HASHTREE_ERROR_MODE_MANAGED_RESTART_AND_EIO表示使用RESTART或EIO模式,具体取决于状态。此模式实现状态机,默认情况下使用RESTART,并且将AVB_SLOT_VERIFY_FLAGS_RESTART_CAUSED_BY_HASHTREE_CORRUPTION传递给avb_slot_verify()时,该模式会转换为EIO。当检测到新的操作系统时,设备将转换回重新启动模式。

为此,需要持久存储-特别是这意味着传递的AvbOps将需要实现read_persistent_value()和write_persistent_value()操作。使用的持久值的名称为avb.managed_verity_mode,并且需要32个字节的存储空间。

AVB_HASHTREE_ERROR_MODE_LOGGING意味着将记录错误,并且损坏的数据可能返回给应用程序。此模式仅应用于诊断和调试。除非允许验证错误,否则不能使用它。

在hashtree_error_mode中传递的值实际上是通过androidboot.veritymode,androidboot.veritymode.managed和androidboot.vbmeta.invalidate_on_error内核命令行参数通过以下方式传递给HLOS的:

- androidboot.veritymode androidboot.veritymode.managed androidboot.vbmeta.invalidate_on_error
AVB_HASHTREE_ERROR_MODE_RESTART_AND_INVALIDATE enforcing (unset) yes
AVB_HASHTREE_ERROR_MODE_RESTART enforcing (unset) (unset)
AVB_HASHTREE_ERROR_MODE_EIO eio (unset) (unset)
AVB_HASHTREE_ERROR_MODE_MANAGED_RESTART_AND_EIO eio or enforcing yes (unset)
AVB_HASHTREE_ERROR_MODE_LOGGING ignore_corruption (unset) (unset)

该表的唯一例外是,如果在顶级vbmeta中设置了AVB_VBMETA_IMAGE_FLAGS_HASHTREE_DISABLED标志,则将androidboot.veritymode设置为disable,并取消设置androidboot.veritymode.managed和androidboot.vbmeta.invalidate_on_error。

二、一些问题

2.1 分区位置读取错乱导致 Verified Error

(1)异常uart日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
[3830] read from system_a, 0x40 bytes at Offset 0x12b7fffc0, partition size 1696096256
[3840] StartBlock 0x12b7ff, ReadOffset 0xfc0, read_size 0x40
[3850] Data segment length: 14
[3860] SCSI Request failed and we have sense data
[3880] =================== Dumping SCSI COMMAND ==========================
[3890] req->lun (4)
[3890] data_buffer_Addr (0x91e36c00)
[3890] data_length (4096)
[3890] scsi_upiu_flags (64)
[3900] upiu_dd_type (2)
[3900] =============================================
[3900] ucs_do_scsi_cmd failed status = 2
[3910] ucs_do_scsi_read: failed
[3910] UFS read failed.
[3930] Error: UFS read failed writing to block: 6720868352
[3940] ReadBlocks failed -1
[3950] avb_slot_verify.c[3950] :[3950] 465[3950] : FATAL: [3950] assert fail: footer_num_read == AVB_FOOTER_SIZE
[3960] avb_abort![3960] panic (frame 0x91e10ba0):
[3970] HALT: reboot into dload mode...

partition.xml内容如下:

1
2
3
4
5
6
7
<configuration>
<!-- This is LUN 0 - HLOS LUN" -->
<physical_partition>
<partition label="system_a" size_in_kb="3809280" .../>
<partition label="system_b" size_in_kb="3809280" .../>
</physical_partition>
</configuration>

(2)解决方案:

验证分区前从分区表中获取分区位置,重新设置index。

kernel/lk/platform/msm_shared/avb/libavb/avb_ops.c

2.2 32位系统中system分区过大导致越界

(1)uart异常日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[    6.561417] device-mapper: uevent: version 1.0.3
[ 6.561765] device-mapper: ioctl: 4.35.0-ioctl (2016-06-23) initialised: dm-devel@redhat.com
[ 6.562038] device-mapper: req-crypt: dm-req-crypt successfully initalized.
[ 10.358500] device-mapper: init: attempting early device configuration.
[ 10.360507] device-mapper: init: adding target '0 9288120 verity 1 PARTUUID=a4c4c338-1495-cf9b-7e73-c0ddf7c9bc48 PARTUUID=a4c4c338-1495-cf9b-7e73-c0ddf7c9bc48 4096 4096 1161015 1161015 sha1 b85c7febdfcd40528289d73d6fba83ef622fe25f 863e5389bbfb951324f892a712a696fe166ca70a 10 restart_on_corruption ignore_zero_blocks use_fec_from_device PARTUUID=a4c4c338-1495-cf9b-7e73-c0ddf7c9bc48 fec_roots 2 fec_blocks 1170158 fec_start 1170158'
[ 10.405240] device-mapper: init: dm-0 is ready
[ 10.589628] init: [libfs_mgr]fs_mgr_read_fstab_default(): failed to find device default fstab
[ 10.774647] system_a : Error verifying vbmeta image: invalid vbmeta header
[ 10.774738] init: [libfs_mgr]avb_slot_verify failed, result: 6
[ 10.780513] init: Failed to open FsManagerAvbHandle: No such file or directory
[ 10.786246] init: Failed to setup verity for '/vendor': No such file or directory
[ 10.793601] init: Failed to mount required partitions early ...
[ 10.801039] init: Reboot start, reason: reboot, rebootTarget: bootloader
[ 10.806753] pgd = e77ec000
[ 10.813678] [00000014] *pgd=f1dcc835

(2)解决方案:

修改vbmeta_offset的类型(size_t -> uint64_t)

external/avb/libavb/avb_slot_verify.c

2.3 未打开FEC宏导致dm-verity找不到参数

(1)uart异常日志:

1
2
3
4
5
6
7
8
9
10
11
[    0.000000] device-mapper: init: will configure 1 devices
[ 21.089662] device-mapper: uevent: version 1.0.3
[ 21.091550] device-mapper: ioctl: 4.34.0-ioctl (2015-10-28) initialised: dm-devel@redhat.com
[ 21.093822] device-mapper: req-crypt: dm-req-crypt successfully initalized.
[ 23.798681] device-mapper: init: attempting early device configuration.
[ 23.803076] device-mapper: init: adding target '0 9659008 verity 1 PARTUUID=bbfd550b-e7cb-04b5-a67f-85ae9774d6a7 PARTUUID=bbfd550b-e7cb-04b5-a67f-85ae9774d6a7 4096 4096 1207376 1207376 sha1 86951e16e496e7d1453ad9e2915ab9cd1e18e19a 05f694baa36f322ce668ce91ed3140a972e378a4 10 restart_on_corruption ignore_zero_blocks use_fec_from_device PARTUUID=bbfd550b-e7cb-04b5-a67f-85ae9774d6a7 fec_roots 2 fec_blocks 1216884 fec_start 1216884'
[ 23.821562] device-mapper: table: 253:0: verity: Invalid number of feature args
[ 23.841996] device-mapper: init: starting dm-0 (vroot) failed
[ 23.850903] Unable to handle kernel NULL pointer dereference at virtual address 00000000
[ 23.855158] pgd = ffffff800a13e000
[ 23.863585] [00000000] *pgd=000000017e3be003, *pud=000000017e3be003, *pmd=0000000000000000

(2)解决方案:

在kernel config中打开CONFIG_DM_VERITY_FEC宏

kernel/msm-4.4/arch/arm64/configs/project_defconfig

2.4 boot分区超过heap size加载失败

(1)uart异常日志:

1
2
3
[partition_get_index]find boot_b index 38
[AVB20]malloc: heap size not enough
avbutil.c[2701] 199[2701]: ERROR: [2701] Failed to allocate memory.

(2)解决方案:

修改boot分区大小时同步修改AVB_HEAP_SZ

bootable/bootloader/lk/platform/common/boot/avb20/load_vfy_boot_ab.c

2.5 分区slot后缀未过滤导致无法加载

(1)uart异常日志:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
[3590] Loading image system_a
[3610] read from system_a, 0x740 bytes at Offset 0xfff2a000, partition size 3011510272
[3630] LastBlock 0x7ff953, ReadOffset 0x0, read_size 0x140
[3630] FullBlock 0x7ff950, ReadOffset 0x0, read_size 0x600 outside range. StartPageReadSize 0x0 PageSize 512 ptn 0xb3800000 Buffer 0x8f7aedc0
[3670] ReadRollbackIndex Location 2, RollbackIndex 0
[3690] avb_slot_verify.c: dtbo_a : Loading entire partition.
[3690] avb_slot_verify.c: load_and_verify_hash_partition.
[3720] Loaded image [boot|22044672]
[3720] Loaded image [dtbo|8388608]
[3720] Loaded image [vbmeta|65536]
[3730] Loading image dtbo_a
[3750] read from dtbo_a, 0x800000 bytes at Offset 0x0, partition size 260046848
[3770] FullBlock 0x0, ReadOffset 0x0, read_size 0x800000 outside range. StartPageReadSize 0x0 PageSize 512 ptn 0xf800000 Buffer 0x8f77b428
[3780] Error: ADMA error
[3780] MMC card is not in TRAN state
[3790] Failed Reading block @ 7c000
[3790] ReadBlocks failed 1
[3790] data abort, halting

(2)解决方案:

验证分区时移除slot后缀,兼容a/b和non-a/b系统。

kernel/lk/platform/msm_shared/avb/libavb/avb_slot_verify.c