Workaround for log insight upgrade failure from 8.2 GA to 8.2 HF3 with error "Upgrade Unconfirmed"

When you are using custom CA signed certificates the upgrade to LI nodes may not succeed on LI version 8.2

Faced the issue while applying the log4j vulnerability hotfix on my vRLI stand alone node by following the steps mentioned in this VMware KB (https://kb.vmware.com/s/article/87320)

Procedure to download the patch file:

You can download the vRealize Log Insight 8.2 Hot Fix PAK file from the VMware Patch Portal.
Select vRealize Log Insight as the Product and select 8.2 as the version and click Search.
Select the file VMware-vRealize-Log-Insight-8.2-19283343.pak
Click Download

Steps to upgrade/patch the LI node:

Login to the vRLI GUI
Click Administration from the main page
Click Cluster under management
Click Upgrade Cluster and select the downloaded .pak from above procedure
Once the uploading of the .pak file is completed you will be prompted to accept EULA
Accept the new EULA to start the upgrade

The upgrade continued to run and after 1 hour received the warning message "Upgrade Unconfirmed"

You can tail the upgrade.log which is stored at location (/storage/var/loginsight) to monitor the upgrade and in my case the upgrade.log file wasn't getting updated post the reboot (part of upgrade procedure)

upgrade.log

2022-03-22 12:07:56,075 loginsight-upgrade INFO Signature of the manifest validated: Verified OK
2022-03-22 12:07:56,225 loginsight-upgrade INFO Current version is 8.2.0-16957702 and upgrade version is 8.2.0-19283343. Version Check successful!
2022-03-22 12:07:56,225 loginsight-upgrade INFO Available Disk Space at /tmp: 16044150784
2022-03-22 12:07:56,225 loginsight-upgrade INFO Disk Space Check successful!
2022-03-22 12:07:56,225 loginsight-upgrade INFO Available Disk Space at /storage/core: 340466085888
2022-03-22 12:07:56,225 loginsight-upgrade INFO Disk Space Check successful!
2022-03-22 12:07:56,225 loginsight-upgrade INFO Available Disk Space at /storage/var: 16986517504
2022-03-22 12:07:56,225 loginsight-upgrade INFO Disk Space Check successful!
2022-03-22 12:07:58,145 loginsight-upgrade INFO Checksum validation successful!
2022-03-22 12:07:58,145 loginsight-upgrade INFO Attempting to upgrade to version 8.2.0-19283343

/storage/core/upgrade/kexec-li script run took 2 seconds
Partition sda5 , which is lazy partition, will be formatted and will become root partition
Photon to Photon upgrade flow will be called, where base OS was Photon
... Starting to run photon2photon script ...
Root partition copy took 3 seconds
clean up upgrade-image.rpm
Removing lock file
/storage/core/upgrade/photon2photon-base-photon.sh script run took 4 seconds
Rebooting..

In runtime.log i can see below exceptions:

[2022-03-22 12:19:17.119+0000] ["main"/10.123.236.154 INFO] [com.vmware.loginsight.repository.RepositoryReader] [Repository reader closed]
[2022-03-22 12:19:17.119+0000] ["main"/10.123.236.154 INFO] [com.vmware.loginsight.ingestion.StorageIntegrityChecker] [Checked repository of BucketId(uuid:60b68cf4-5245-4527-bc49-2fd696dd7456, createTime:1647936728113): CLEAN]
[2022-03-22 12:19:17.153+0000] ["Thread-55"/10.123.236.154 INFO] [com.vmware.loginsight.commons.executor.ProcessExecutor] [Finished executing /usr/lib/loginsight/application/3rd_party/apache-tomcat-8.5.57/bin/catalina.sh run, ran for 1276 ms]
[2022-03-22 12:19:17.153+0000] ["Thread-55"/10.123.236.154 FATAL] [com.vmware.loginsight.daemon.StrataServiceFailureHandler] [Service died: '/usr/lib/loginsight/application/3rd_party/apache-tomcat-8.5.57/bin/catalina.sh run' with exit code 1 (Unknown error)]
[2022-03-22 12:19:17.160+0000] ["ActiveMQ ShutdownHook"/10.123.236.154 INFO] [org.apache.activemq.broker.BrokerService] [Apache ActiveMQ 5.15.12 (notifications, ID:vm-tec-vrlif-201.vm.co.mz-42911-1647951532138-1:1) is shutting down]
[2022-03-22 12:19:17.161+0000] ["CassandraClientParallelExecutor-thread-4"/10.123.236.154 WARN] [com.vmware.loginsight.cassandra.CassandraClient] [Exception during asynchronous execution (1601984856966449447): Statement:(select guid, first_time, last_time, features, frozen_state from spock_clusters where bucket = ? limit 1000000) java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.TransportException: [/10.123.236.154:9042] Connection has been closed]
java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.TransportException: [/10.123.236.154:9042] Connection has been closed
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:502) ~[guava-21.0.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:481) ~[guava-21.0.jar:?]
        at com.vmware.loginsight.cassandra.CassandraClient$4.run(CassandraClient.java:887) [database-lib-li.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: com.datastax.driver.core.exceptions.TransportException: [/10.123.236.154:9042] Connection has been closed

[2022-03-22 12:19:17.221+0000] ["CassandraClientParallelExecutor-thread-4"/10.123.236.154 WARN] [com.vmware.loginsight.cassandra.CassandraClient] [Exception during asynchronous execution (1601984856966449447): Statement:(select guid, first_time, last_time, features, frozen_state from spock_clusters where bucket = ? limit 1000000) java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.123.236.154:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.123.236.154:9042] Pool is shutdown))]
java.util.concurrent.ExecutionException: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.123.236.154:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.123.236.154:9042] Pool is shutdown))
        at com.google.common.util.concurrent.AbstractFuture.getDoneValue(AbstractFuture.java:502) ~[guava-21.0.jar:?]
        at com.google.common.util.concurrent.AbstractFuture.get(AbstractFuture.java:461) ~[guava-21.0.jar:?]
        at com.vmware.loginsight.cassandra.CassandraClient$4.run(CassandraClient.java:887) [database-lib-li.jar:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_262]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_262]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_262]
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.123.236.154:9042 (com.datastax.driver.core.exceptions.ConnectionException: [/10.123.236.154:9042] Pool is shutdown))
        at com.datastax.driver.core.RequestHandler.reportNoMoreHosts(RequestHandler.java:283) ~[cassandra-driver-core-3.9.0.jar:?]
        at com.datastax.driver.core.RequestHandler.access$1200(RequestHandler.java:61) ~[cassandra-driver-core-3.9.0.jar:?]
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution.findNextHostAndQuery(RequestHandler.java:375) ~[cassandra-driver-core-3.9.0.jar:?]
        at com.datastax.driver.core.RequestHandler$SpeculativeExecution$1.onFailure(RequestHandler.java:450) ~[cassandra-driver-core-3.9.0.jar:?]

Note: The date, time and errors could be different in your environment.

Workaround:

Create a snapshot of your vRealize Log Insight node/s
Login to the vRLI GUI
Click Administration from the main page
Click on SSL under Configuration
Click on RESET TO DEFAULTS to revert from custom CA certs to default SSL cert
Then click on VIEW details to verify if the certs have been reverted to default SSL cert
Now perform the LI upgrade by following the above mentioned procedure

If you have vRLI cluster setup in your environment then you need to revert from custom CA certs to default SSL cert on all the nodes one by one by logging in directly to the individual node FQDN/IP by following above steps

Note:

In case of Loginsight cluster setup you just need to trigger the upgrade on the master node
After the master node upgrade process is complete, you can wait for the the remaining worker upgrade process to finish, which is automatic

Once the upgrade is completed successfully on all the loginsight nodes you can re-apply the custom CA certificates, you can refer to this VMware article for the procedure

Workaround for log insight upgrade failure from 8.2 GA to 8.2 HF3 with error "Upgrade Unconfirmed"

Recent Posts

コメント

Subscribe Form