Unable to initiate the VCF upgrade

Mohammed Bilal

Sep 4, 20243 min read

With the release of VCF 5.2 a month ago, I decided to upgrade my lab (5.1) to version 5.2 to explore the new features introduced in this version.

To start with I downloaded all the below required bundles.

VCF 5.2 upgrade bundle
VCF 5.2 (SDDC manager) config Drift
NSX-T upgrade bundle
vCenter upgrade bundle
ESXi upgrade bundle

Post this I selected my management domain to upgrade and when I clicked on updates I got the below error:

The status of the management domain and its components shows ACTIVE in UI.

The status of Hosts shows ACTIVE in UI.

The status of the Cluster shows ACTIVE in UI.

Checked the operationsmanager.log and could see the below snippets in the log:

2024-09-03T16:11:31.184+0000 INFO  [vcf_om,b725a8dd5d8c8c6c,ce18] [c.v.v.c.r.a.c.ConfigReconcilerInternalControllerImpl,http-nio-127.0.0.1-7300-exec-5] Internal Config API: GET /config-drifts API called with query parameters: <resourceId> null, <resourceType> null, <configId> null, <driftType> null, <applicable> null
2024-09-03T16:11:31.186+0000 INFO  [vcf_om,b725a8dd5d8c8c6c,ce18] [c.v.v.c.r.a.c.ConfigReconcilerInternalControllerImpl,http-nio-127.0.0.1-7300-exec-5] Using locale: en-US, in internal GET /config-drifts API

Could see below snippets in the domainmanager.log

2024-09-03T08:12:00.252+0000 INFO  [vcf_dm,b5ad0f96e0c44931,13ce] [c.v.v.c.o.c.v.ConfigreconcilerOrchController,http-nio-127.0.0.1-7200-exec-6]  GET /v1/config-drifts/ API called with query parameters: resourceId: 00f8056a-57a8-4ac6-b977-5e74ed5e252c, resourceType: null, configId: null, driftType: null
2024-09-03T08:12:00.253+0000 INFO  [vcf_dm,b5ad0f96e0c44931,13ce] [c.v.v.c.o.c.v.ConfigreconcilerOrchController,http-nio-127.0.0.1-7200-exec-6]  Using locale : en-US
2024-09-03T08:12:00.255+0000 INFO  [vcf_dm,b5ad0f96e0c44931,13ce] [c.v.v.c.o.s.ConfigDriftAggregator,http-nio-127.0.0.1-7200-exec-6]  Fetching configs with param resourceType:null, resourceId: 00f8056a-57a8-4ac6-b977-5e74ed5e252c, driftType: null, driftId: null, applicable: true
2024-09-03T08:12:00.256+0000 INFO  [vcf_dm,b5ad0f96e0c44931,7563] [c.v.v.c.o.a.ConfigDriftApiClient,dm-exec-7]  Passing orch locale en-US to internal service API call
2024-09-03T08:12:00.256+0000 INFO  [vcf_dm,b5ad0f96e0c44931,41fa] [c.v.v.c.o.a.ConfigDriftApiClient,dm-exec-11]  Passing orch locale en-US to internal service API call
2024-09-03T08:12:00.257+0000 INFO  [vcf_dm,d6d32dcc3ae2c642,07bf] [c.v.v.c.r.a.c.ConfigReconcilerInternalControllerImpl,http-nio-127.0.0.1-7200-exec-4]  Internal Config API: GET /config-drifts API called with query parameters: <resourceId> 00f8056a-57a8-4ac6-b977-5e74ed5e252c, <resourceType> null, <configId> null, <driftType> null, <applicable> true

Decided to look into the db to understand what was causing this, in case any issues on any of the components. As there is not much helpful info in the logs to understand what causing this.

Both sddc manager and vCenter status shows as active in the db:

platform=# select * from vcenter where id='86b1306b-bb28-4ec3-ab67-2e3173b03559';
-[ RECORD 1 ]------------+-------------------------------------
id                       | 86b1306b-bb28-4ec3-ab67-2e3173b03559
creation_time            | 1701953558288
modification_time        | 1724954317241
bundle_repo_datastore    | lcm-bundle-repo
datastore_name           | sfo-m01-cl01-ds-vsan01
ssh_host_key             |
ssh_host_key_type        |
status                   | ACTIVE
type                     | MANAGEMENT
version                  | 8.0.2.00100-22617221
vm_hostname              | vc-l-01a
vm_management_ip_address | 192.168.xx.a
vm_name                  | vc-l-01a
join_sso_status          | JOINED

platform=# select * from domain;
-[ RECORD 1 ]------------+-------------------------------------
id                       | 00f8056a-57a8-4ac6-b977-5e74ed5e252c
creation_time            | 1701953557851
modification_time        | 1710428973323
name                     | sddc-m1
organization             | COM
status                   | ACTIVE
type                     | MANAGEMENT
vra_integration_status   | ENABLED
vrops_integration_status | FAILED
vrli_integration_status  | FAILED
sso_id                   | aa60a3be-7906-4d34-b9e4-e531e9f73f0e
sso_name                 | vsphere.local
is_management_sso_domain | t

Started to look into every table in the db and managed to find the PSC status marked as ERROR in the platform db:

platform=# select * from psc;
                  id                  | creation_time | modification_time | bundle_repo_datastore |     datastore_name     | is_replica | port | ssh_host_key | ssh_host_key_type |  sso_domain   | status | sub_
domain |       version        |     vm_hostname     | vm_management_ip_address | vm_name
--------------------------------------+---------------+-------------------+-----------------------+------------------------+------------+------+--------------+-------------------+---------------+--------+-----
-------+----------------------+---------------------+--------------------------+----------
 aa60a3be-7906-4d34-b9e4-e531e9f73f0e | 1701953558168 |     1716981515538 | lcm-bundle-repo       | sfo-m01-cl01-ds-vsan01 | f          |  443 |              |                   | vsphere.local | ERROR  | | 8.0.2.00100-22617221 | vc-l-01a | 192.168.xx.xx      | vc-l-01a

I got to know where the underlying issue is and looked into the VC/PSC logs and couldn't see any issues/errors on the VC services as well. Took the SDDC manager snapshot and changed the status of the PSC to ACTIVE followed by restarting the VCF services.

platform=# update psc set status='ACTIVE' where id='aa60a3be-7906-4d34-b9e4-e531e9f73f0e';
UPDATE 1

platform=# select * from psc;
                  id                  | creation_time | modification_time | bundle_repo_datastore |     datastore_name     | is_replica | port | ssh_host_key | ssh_host_key_type |  sso_domain   | status | sub_domain |       version        |     vm_hostname     | vm_management_ip_address | vm_name
--------------------------------------+---------------+-------------------+-----------------------+------------------------+------------+------+--------------+-------------------+---------------+--------+------------+----------------------+---------------------+--------------------------+----------
 aa60a3be-7906-4d34-b9e4-e531e9f73f0e | 1701953558168 |     1716981515538 | lcm-bundle-repo       | sfo-m01-cl01-ds-vsan01 | f          |  443 |              |                   | vsphere.local | ACTIVE | corp.local | 8.0.2.00100-22617221 | vc-l-01a.corp.local | 192.168.40.1             | vc-l-01a
(1 row)

Post which I wasn't seeing any issues or errors on the update screen. Strangely, PSC was marked in an ERROR state, not sure why. But managed to get the upgrade working with the above workaround.

Unable to initiate the VCF upgrade

Recent Posts

Commentaires

Subscribe Form