Is your HADR-TSA configuration corrupted ? hung ? None of the TSA commands executing. Take a deep breath, and start thinking from TSA perspective rather than DB2 perspective.
You need to make sure TSA resource are alive and running well... hold on.... Has that also been holding up your DB2 instances to be started ?
The ONLY and ONLY reason db2 instance is stuck is because the cluster manager is still set to TSA. Please verify it.
db2 get dbm cfg | grep -i clus
Cluster manager = TSA
How to get rid of this ? You need to run "db2haicu -disable" or "db2haicu -delete". But, these commands are hanging too for you. So, lets do the cleanup here.
Step1 : Find out the TSA domain
lsrpdomain
Step2 : Remove the domain
rmrpdomain -f <domain-name>
Now, check the dbm config for cluster manager, if its set to TSA. You can run "db2haicu -delete" and it should work fine. Now, you can start the db2 instance safely, start the HADR processes and deliver the environment to business if they are waiting for it. Because, you are going to build TSA anyways online while DB2 and HADR is up and running.
Now, let's look at the background processes / resources which are used by TSA.
Step1 : Check the RSCT daemons if they are active
lssrc -g rsct
Subsystem Group PID Status
ctcas rsct 11403338 active
ctrmc rsct 20512964 active
You may like to recycle them using "stopsrc -g rsct" and "startsrc -g rsct".
Step2 : Check the resource manager's status
lssrc -g rsct_rm
Subsystem Group PID Status
IBM.HostRM rsct_rm 11665650 active
IBM.ServiceRM rsct_rm 9437272 active
IBM.DRM rsct_rm 15990970 active
IBM.ConfigRM rsct_rm 20250724 active
IBM.MgmtDomainRM rsct_rm 19792028 active
IBM.StorageRM rsct_rm 16646386 active
IBM.TestRM rsct_rm 19923128 active
IBM.RecoveryRM rsct_rm 16384102 active
IBM.GblResRM rsct_rm 11272240 active
IBM.ERRM rsct_rm inoperative
IBM.LPRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
When you have problem most of them will be inactive, you can recycle these processes too, "stopsrc -g rsct_rm" and "startsrc -g rsct_rm". Even then, all the processes will not be active unless you have created the Domain. So, my suggestion would be to start creating domain using "db2haicu" command line utility and keep checking resource manager status using "lssrc -g rsct_rm".
Exclusively, in my scenario, IBM.StorageRM resource was always inactive, but my TSA was failing at resource "IBM.RecoveryRM" however it was active. But, I learned another information, even if IBM.RecoveryRM is active that does not mean its really working. You can verify the status from here :
lssrc -ls IBM.RecoveryRM | grep -i "In Config State"
=> True means complete
=> False means still initializing, so not ready to service commands like lssam
In my case, it was always False, but it was just victimized because of IBM.StorageRM issue. We had opened PMR with rsct team in IBM lab and they pointed , storage component were deinstalled due to OS patching work. Again, we did not follow the best practices for OS patching in TSA environment. Once, we installed, our resources were back to work.
I think I touched the way to navigate TSA hanging troubleshooting. But, its not always the case that you could resolve by yourself. But, it will definitely save tons of time.
References :
http://www-01.ibm.com/support/docview.wss?uid=swg21385581
http://www-01.ibm.com/support/docview.wss?uid=swg21236233
http://www-01.ibm.com/support/docview.wss?uid=swg21293701
Note:- This information is shared based on my knowledge and the experience.
You need to make sure TSA resource are alive and running well... hold on.... Has that also been holding up your DB2 instances to be started ?
The ONLY and ONLY reason db2 instance is stuck is because the cluster manager is still set to TSA. Please verify it.
db2 get dbm cfg | grep -i clus
Cluster manager = TSA
How to get rid of this ? You need to run "db2haicu -disable" or "db2haicu -delete". But, these commands are hanging too for you. So, lets do the cleanup here.
Step1 : Find out the TSA domain
lsrpdomain
Step2 : Remove the domain
rmrpdomain -f <domain-name>
Now, check the dbm config for cluster manager, if its set to TSA. You can run "db2haicu -delete" and it should work fine. Now, you can start the db2 instance safely, start the HADR processes and deliver the environment to business if they are waiting for it. Because, you are going to build TSA anyways online while DB2 and HADR is up and running.
Now, let's look at the background processes / resources which are used by TSA.
Step1 : Check the RSCT daemons if they are active
lssrc -g rsct
Subsystem Group PID Status
ctcas rsct 11403338 active
ctrmc rsct 20512964 active
You may like to recycle them using "stopsrc -g rsct" and "startsrc -g rsct".
Step2 : Check the resource manager's status
lssrc -g rsct_rm
Subsystem Group PID Status
IBM.HostRM rsct_rm 11665650 active
IBM.ServiceRM rsct_rm 9437272 active
IBM.DRM rsct_rm 15990970 active
IBM.ConfigRM rsct_rm 20250724 active
IBM.MgmtDomainRM rsct_rm 19792028 active
IBM.StorageRM rsct_rm 16646386 active
IBM.TestRM rsct_rm 19923128 active
IBM.RecoveryRM rsct_rm 16384102 active
IBM.GblResRM rsct_rm 11272240 active
IBM.ERRM rsct_rm inoperative
IBM.LPRM rsct_rm inoperative
IBM.AuditRM rsct_rm inoperative
When you have problem most of them will be inactive, you can recycle these processes too, "stopsrc -g rsct_rm" and "startsrc -g rsct_rm". Even then, all the processes will not be active unless you have created the Domain. So, my suggestion would be to start creating domain using "db2haicu" command line utility and keep checking resource manager status using "lssrc -g rsct_rm".
Exclusively, in my scenario, IBM.StorageRM resource was always inactive, but my TSA was failing at resource "IBM.RecoveryRM" however it was active. But, I learned another information, even if IBM.RecoveryRM is active that does not mean its really working. You can verify the status from here :
lssrc -ls IBM.RecoveryRM | grep -i "In Config State"
=> True means complete
=> False means still initializing, so not ready to service commands like lssam
In my case, it was always False, but it was just victimized because of IBM.StorageRM issue. We had opened PMR with rsct team in IBM lab and they pointed , storage component were deinstalled due to OS patching work. Again, we did not follow the best practices for OS patching in TSA environment. Once, we installed, our resources were back to work.
I think I touched the way to navigate TSA hanging troubleshooting. But, its not always the case that you could resolve by yourself. But, it will definitely save tons of time.
References :
http://www-01.ibm.com/support/docview.wss?uid=swg21385581
http://www-01.ibm.com/support/docview.wss?uid=swg21236233
http://www-01.ibm.com/support/docview.wss?uid=swg21293701
Note:- This information is shared based on my knowledge and the experience.