I got these circumstances where puppet mucked with uid/gid of files and directories.  Consequently, the clusterware on one node won’t come up anymore.  2 weeks interacting with Oracle Support to no avail.  Unbelievable…  Ultimately they suggested to remove the nodes from the cluster and add them back.  That is exhaustive enough that I may as well re-install the stack from scratch.  This step-by-step guide captures the experience of re-installing the entire ODA Release 18.3 stack myself remotely through the ILOM.

Why Do This?

For the initial deployment, Oracle will come to your data center, physically put together everything, and install the entire stack.  It is a turn-key solution.  But you may run into situation like mine that requires you to redo the installation yourself.  Once you get this into a repeatable process, then you can keep testing until you’re happy with the results.

Specs

Let’s first cover some specs.

Baremetal/Virtualized: Baremetal
Hardware: ODA X7-2HA
Storage: Single shelf
Clusterware version:
 18.0.0.0 (fixed with the 18.3 stack)
Database version: 12.1.0.2 (12.1.0.2, 12.2.0.1, 18.0.0.0 available)

Download Software

Download all the needed software and get it all onto the ODA hosts in a directory structure.

StepScreenshot/notes
Download Server patch28864490
Download Clusterware patch27604593
Download Database patch23494992
Put the patch files on the serversscp these files to each host under /opt and organize them into this structure:
/opt/patches/server
/opt/patches/gi
/opt/patches/db
Unzip the zip files and delete the zip files to save space.

This is the page showing all the software in case you want to see other available downloads: link




Reimaging

Next we will re-image the systems, giving them the base software stack for the server and storage components.

StepScreenshot/notes
Download the zipped disk imageslink
Extract zip on your local drive
Browse to BOTH ILOMs and log in as root. The password should be welcome1.
Launch Remote Console
Mount the iso on your local disk remotely onto your console.On each console:
KVMS / Storage
Click Add, to add an iso image locally stored on your workstation.
Uncheck SSL Enabled
Connect
OK
Specify the boot device.For each web console, specify a CDROM booting device. This will only take effect once on the next reboot.
Host Management (left) / Host Control
Next boot device: CDROM
Save
Power control
Power cycle
Save
Ok (This will reboot the host.)
Upon reboot, the hosts will reimage by themselves.

Patching

On the patching step, we’ll update the repository about the server patches(server + storage), clusterware patches, and database patches.

StepScreenshot/notes
ssh to both hostsroot/welcome1
Configure networkRun the following as root:
configure-firstnet
Sanity checks/opt/oracle/dcs/bin/odacli validate-storagetopology
ethtool p1p1
ethtool p1p2
Server
Update repositoryUpdate repo only from the first node only.
Do not put spaces near the commas in the -f list.

/opt/oracle/dcs/bin/odacli update-repository -f
/opt/patches/server/oda-sm-18.3.0.0.0-181205-server1of3.zip,
/opt/patches/server/oda-sm-18.3.0.0.0-181205-server2of3.zip,
/opt/patches/server/oda-sm-18.3.0.0.0-181205-server3of3.zip

/opt/oracle/dcs/bin/odacli describe-job -i {job id from above}
Update ServerFire the update on the first host. This command does it for both hosts.
The –node or -n parameter can be used if you want to do it per server.

/opt/oracle/dcs/bin/odacli update-server -v 18.3.0.0.0
/opt/oracle/dcs/bin/odacli describe-job -i {job id from above}
Sanity checkA casual check on the status of the stack:
/opt/oracle/dcs/bin/odacli describe-latestpatch
Storage
Update StorageThe storage patches are included in the zip files at step 6. So you just need to fire the update for the storage.
If you're updating an ODA in production where a critical service is running, you would want to use the -rolling parameter.

/opt/oracle/dcs/bin/odacli update-storage -v 18.3.0.0.0 -rolling
Clusterware/DB
Update RepositoryFor the clusterware and database software, you just need to update the repository because it is the Create Appliance step later that would install it and create the database.

/opt/oracle/dcs/bin/odacli update-repository -f /opt/patches/gi/odacli-dcs-18.3.0.0.0-181205-GI-18.3.0.0.zip,/opt/patches/db/odacli-dcs-18.3.0.0.0-180905-DB-12.1.0.2.zip



Creating the Appliance

Make sure both port 7070 and 7093 are opened between your workstation and the ODA, so that you can actually reach the web console through a browser.

StepScreenshot/notes
CleanupIf you're redoing this Create Appliance step, run the following as root:
/opt/oracle/oak/onecmd/cleanup.pl
Reconfigure networkRun the following as root:
configure-firstnet
Configure DNSvi /etc/resolv.conf and put in the appropriate DNS configuration entries.
Launch Web Consolehttps://{IP}:7093/mgmt/index.html
where IP is the public IP address of the first host.
Change password for oda-admin
Log back in to the Web Consolehttps://{IP}:7093/mgmt/index.html
where IP is the public IP address of the first host.
Create ApplianceClick on Create Appliance and enter all the necessary info.


Some notes:
System Information
Domain name: Do not enter a dot in the front, like ".bwong64.com". That would get you in so much trouble.
Data Storage Percentage: Be careful about this setting. Once the appliance is created, it would be very hard to change.
User and Group Selection
If you use puppet and it has specific UIDs and GIDs, so that puppet won't flip change values later on.
ASR Information
Select INTERNAL for ASR Type, since the ODA has its own ASR manager service.

Issues

1) After completing all the steps above and I rebooted the hosts, the Oracle instances had problem starting up.  The alert logs were showing the following entries, indicating there’s a problem with memory configuration.

Tue May 14 00:06:17 2019
Dump of system resources acquired for SHARED GLOBAL AREA (SGA)
Tue May 14 00:06:17 2019
Per process system memlock (soft) limit = 64K
Tue May 14 00:06:17 2019
Expected per process system memlock (soft) limit to lock
SHARED GLOBAL AREA (SGA) into memory: 96G
Tue May 14 00:06:17 2019
Available system pagesizes:
4K, 2048K
Tue May 14 00:06:17 2019
Supported system pagesize(s):
Tue May 14 00:06:17 2019
PAGESIZE AVAILABLE_PAGES EXPECTED_PAGES ALLOCATED_PAGES ERROR(s)
Tue May 14 00:06:17 2019
2048K 101430 49154 0 ORA-27125
Tue May 14 00:06:17 2019
Reason for not supporting certain system pagesizes:
Tue May 14 00:06:17 2019
4K - Large pagesizes only
Tue May 14 00:06:17 2019
RECOMMENDATION:
Tue May 14 00:06:17 2019
Increase per process memlock (soft) limit to at least 96GB to lock 100% of SHARED GLOBAL AREA (SGA) pages into physical memory Tue May 14 00:06:17 2019

The solution is to edit /etc/security/limits.conf and change the following entries to be at least the size of the SGA:

soft memlock {size in KB}

Last Words

Thanks for staying thus far.  If you find this useful, please make a comment and subscribe to my email list.  Much appreciated!

Brian Wong Oracle , ,

Leave a Reply

Your email address will not be published. Required fields are marked *