How to upgrade revision¶
This guide shows how to perform a minor upgrade of a Charmed OpenSearch deployment and, if needed, how to roll back to the previous revision.
Perform a minor upgrade¶
A minor upgrade is an upgrade from one minor version to another:
OpenSearch X.Y -> OpenSearch X.Y+1.
For example, from OpenSearch 2.14 to OpenSearch 2.15.
This guide will walk you through the steps to upgrade your OpenSearch cluster, including pre-upgrade checks, upgrading the OpenSearch cluster, preparing the application for the in-place upgrade, initiating the upgrade, resuming the upgrade, and checking the cluster’s health.
Caution
In large deployments, upgrades should follow a specific role-dependent order.
Upgrade all applications without the cluster_manager role first, then upgrade applications
with the cluster_manager role.
The steps below describe upgrading a single application.
In large deployments, repeat these steps for each application, following this order.
Pre-upgrade checks¶
Before upgrading your OpenSearch cluster, ensure that you have completed the following steps:
Backup your data: Before upgrading, back up your data to prevent data loss in case of failure. For more information, see How to create a backup.
Make sure not to perform any extraordinary operations: Avoid performing any concurrent operations on the cluster during the upgrade process. This can lead to an inconsistent state of the cluster. This includes:
Adding or removing units
Creating or destroying new relations
Changes in workload configuration
Upgrading other connected/related/integrated applications simultaneously
Backup / restore of snapshots
Upgrade the OpenSearch cluster¶
To upgrade your OpenSearch cluster, follow these steps:
Collect all necessary pre-upgrade information. It will be required for the rollback (if requested). Do NOT skip this step.
(optional) Scale-up: The new sacrificial unit will be the first to be updated, and will simplify the rollback procedure in the case of an upgrade failure.
Prepare the “Charmed OpenSearch” Juju application for the in-place upgrade. See the step description below for all the technical details the charm executes.
Upgrade: Only one app unit will be upgraded once started. In case of failure, roll back with
juju refresh.Resume upgrade: The upgrade can be resumed if the upgrade of the first unit is successful. All units in an app will be upgraded sequentially from the highest to lowest unit number.
(optional) Scale back: Remove no longer necessary units created in step 2 (if any).
Post-upgrade check: Ensure all units are in the proper state and the cluster is healthy.
Collect all necessary pre-upgrade information¶
The first step is to record the revision of the running application,
as a safety measure for a rollback action.
To accomplish this, run the juju status command and look for the deployed
Charmed OpenSearch revision in the command output, e.g.:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:16:46Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/edge 144 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 ubuntu@22.04 Running
1 started 10.214.176.220 juju-0c35d2-1 ubuntu@22.04 Running
2 started 10.214.176.175 juju-0c35d2-2 ubuntu@22.04 Running
3 started 10.214.176.31 juju-0c35d2-3 ubuntu@22.04 Running
For this example, the current revision is 144 for OpenSearch.
Note
Make sure to store the revision number in case of rollback.
If the deployment is of a local charm, save a copy of the current .charm file.
Scale-up (optional)¶
Optionally, it is recommended to scale the application up by one unit before upgrading.
The new unit will be the first one to be updated, and it will assert that the upgrade is possible. In the event of a failure, an extra unit simplifies manual recovery without disrupting service.
juju add-unit opensearch
Wait for the new unit to be up and ready.
Prepare the application for the in-place upgrade¶
IMPORTANT: Create a backup of your cluster
Refer to How to create a backup.
Perform the
pre-upgrade-checkaction
After the application has settled, it’s necessary to run the pre-upgrade-check action against the leader unit:
juju run opensearch/leader pre-upgrade-check
The output should be the following:
Running operation 1 with 1 task
- task 2 on unit-opensearch-2
Waiting for task 2...
result: Charm is ready for upgrade
The action will ensure and check the health of OpenSearch and determine if the charm is well prepared to start an upgrade procedure.
Initiate the upgrade¶
Caution
Caution: Charmed OpenSearch supports performance profiles and will have different RAM consumption according to the profile chosen:
production: consumes 50% of the RAM available, up to 32Gstaging: consumes 25% of the RAM available, up to 32Gtesting: consumes 1G of RAM
In case your charm is running on revision prior to 185, the testing profile will be your default value. Ensure you have it set at upgrade and then feel free to switch to another profile that is more suitable to your use-case.
Use the juju refresh command to trigger the charm upgrade process.
You have control over what upgrade you want to apply:
You can upgrade the charm to the latest revision available in the charm store for a specific channel, in this case, the edge channel:
# If your charm is running a revision prior to 185, then set the profile explicitly: juju refresh opensearch --channel 2/edge --config profile="testing" # Otherwise, just refresh juju refresh opensearch --channel 2/edge
You can also upgrade the charm to a specific revision:
juju refresh opensearch --revision 145
Or you can upgrade the charm using a local charm file:
juju refresh opensearch --path /path/to/your/charm/file.charm
The OpenSearch upgrade will execute only on the highest ordinal unit, for the running example
OpenSearch, the juju status will look as follows:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:29:07Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch blocked 4 opensearch 2/edge 145 no Upgrading. Verify highest unit is healthy & run `resume-upgrade` action. To rollback, `juju refresh` to last revision
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/1 active idle 1 10.214.176.220 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/2* active idle 2 10.214.176.175 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/3 active idle 4 10.214.176.7 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 ubuntu@22.04 Running
1 started 10.214.176.220 juju-0c35d2-1 ubuntu@22.04 Running
2 started 10.214.176.175 juju-0c35d2-2 ubuntu@22.04 Running
3 started 10.214.176.31 juju-0c35d2-3 ubuntu@22.04 Running
4 started 10.214.176.7 juju-0c35d2-4 ubuntu@22.04 Running
Note
The unit should recover shortly after, but the time can vary depending on the amount of data written to the cluster while the unit was not part of the cluster. Please be patient with the huge installations.
Resume the upgrade¶
After the first unit is upgraded, the charm will set the unit upgrade state as completed. If deemed necessary, you can further assert the success of the upgrade. If the unit is healthy within the cluster, the next step is to resume the upgrade process by running:
juju run opensearch/leader resume-upgrade
The resume-upgrade action will roll out the OpenSearch upgrade for the remaining units in the application.
The action will be executed sequentially from the highest unit number to the lowest.
After every unit is upgraded, its status will be set to active/idle and its message will indicate
the new version of OpenSearch running on the unit. The juju status output will look as follows:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:39:06Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch maintenance 4 opensearch 2/edge 145 no Upgrading. To rollback, `juju refresh` to the previous revision
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/1 waiting executing 1 10.214.176.220 9200/tcp Waiting for OpenSearch to start...
opensearch/2* active idle 2 10.214.176.175 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
opensearch/3 active idle 4 10.214.176.7 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 ubuntu@22.04 Running
1 started 10.214.176.220 juju-0c35d2-1 ubuntu@22.04 Running
2 started 10.214.176.175 juju-0c35d2-2 ubuntu@22.04 Running
3 started 10.214.176.31 juju-0c35d2-3 ubuntu@22.04 Running
4 started 10.214.176.7 juju-0c35d2-4 ubuntu@22.04 Running
Once all units are upgraded, the application status will be set to active
and the message indicating the new version of OpenSearch running on the units will disappear.
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:43:41Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 4 opensearch 2/edge 145 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
opensearch/3 active idle 4 10.214.176.7 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 ubuntu@22.04 Running
1 started 10.214.176.220 juju-0c35d2-1 ubuntu@22.04 Running
2 started 10.214.176.175 juju-0c35d2-2 ubuntu@22.04 Running
3 started 10.214.176.31 juju-0c35d2-3 ubuntu@22.04 Running
4 started 10.214.176.7 juju-0c35d2-4 ubuntu@22.04 Running
Notice the Rev column in the juju status output.
The revision number should reflect the new revision of the application.
Rollback (optional)¶
In case of a failed upgrade, you might potentially be able to rollback to the previous revision. To do so, follow the Perform a minor rollback section below.
Scale-back (optional)¶
If you scaled up the application in step 2, you can now scale it back down to the original number of units:
juju remove-unit opensearch/<highest unit number>
Check the cluster health¶
First, check the units have settled as active/idle state on juju status,
with the newer revision number:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 10:45:39Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/edge 145 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 0 10.214.176.180 9200/tcp
opensearch/1 active idle 1 10.214.176.220 9200/tcp
opensearch/2* active idle 2 10.214.176.175 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.31
Machine State Address Inst id Base AZ Message
0 started 10.214.176.180 juju-0c35d2-0 ubuntu@22.04 Running
1 started 10.214.176.220 juju-0c35d2-1 ubuntu@22.04 Running
2 started 10.214.176.175 juju-0c35d2-2 ubuntu@22.04 Running
3 started 10.214.176.31 juju-0c35d2-3 ubuntu@22.04 Running
Check the cluster is healthy. OpenSearch’s upstream documentation suggests the following check:
GET "/_cluster/health?pretty"
The response should look similar to the following example:
{
"cluster_name" : "opensearch-wvmy",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 5,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Perform a minor rollback¶
Caution
OpenSearch does not support downgrading. For more information, please refer to the upstream OpenSearch documentation about rolling upgrades.
While rollbacking a charm revision that does not change the underlying OpenSearch version is a safe operation, it is important to note that rollbacking in Charmed OpenSearch is a best-effort process to restore the cluster to a previous revision. If the OpenSearch workload version is different, it does not guarantee that the cluster will be rolled back to a previous version.
After a juju refresh, if there are any version incompatibilities in charm revisions,
their dependencies, or any other unexpected failure in the upgrade process,
the process will be halted and enter a failure state.
Even if the underlying OpenSearch cluster continues to work, it’s important to roll back the charm to a previous revision so that an update can be attempted after further inspection of the failure.
Pre-rollback checks¶
To execute a rollback we take the same procedure as the upgrade, the difference being the charm revision to upgrade to. As an example follow up the minor upgrades guide.
It is important to run the pre-upgrade-check action to ensure the cluster is in a healthy state
before the rollback. This action will check the cluster health and the status of the upgrade.
juju run opensearch/leader pre-upgrade-check
Once the pre-upgrade checks are complete, and you get the Charm is ready for upgrade message,
you can proceed with the rollback.
For example, here is the status of the OpenSearch cluster after upgrading one unit to revision 145:
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 12:24:17Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch blocked 3 opensearch 2/edge 145 no Upgrading. Verify highest unit is healthy & run `resume-upgrade` action. To rollback, `juju refresh` to la
st revision
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0* active idle 0 10.214.176.187 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/1 active idle 1 10.214.176.197 9200/tcp OpenSearch 2.15.0 running; Snap rev 56 (outdated); Charmed operator 1+e686854
opensearch/2 active idle 2 10.214.176.222 9200/tcp OpenSearch 2.16.0 running; Snap rev 57; Charmed operator 1+e686854
self-signed-certificates/0* active idle 3 10.214.176.93
Machine State Address Inst id Base AZ Message
0 started 10.214.176.187 juju-dd97d9-0 ubuntu@22.04 Running
1 started 10.214.176.197 juju-dd97d9-1 ubuntu@22.04 Running
2 started 10.214.176.222 juju-dd97d9-2 ubuntu@22.04 Running
3 started 10.214.176.93 juju-dd97d9-3 ubuntu@22.04 Running
Notice that the OpenSearch charm is at revision 145.
Rollback the charm¶
Caution
Caution: Do not trigger rollback during the running upgrade action. It may cause an unpredictable OpenSearch state.
Caution
Caution: Rollbacks in Charmed OpenSearch are a best-effort process. It is recommended to perform a backup and restore to a new deployment with the desired OpenSearch version instead of performing a rollback. Rollbacks carry the potential of data loss and downtime.
Rollback a charm revision with the same workload version¶
You can initiate the rollback by running the refresh command with the revision of
the charm you want to rollback to. For example, to rollback to revision 144, run:
juju refresh opensearch --revision=144
When deploying from a local charm file, you must have the previous revision’s .charm file.
Then, run:
juju refresh opensearch --path=<path_to_charm_file>
After the refresh command, the Juju controller revision for the application will be back in sync with the running OpenSearch revision.
Model Controller Cloud/Region Version SLA Timestamp
dev development localhost/localhost 3.5.3 unsupported 12:27:02Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/edge 144 no
self-signed-certificates active 1 self-signed-certificates latest/stable 155 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0* active idle 0 10.214.176.187 9200/tcp
opensearch/1 active idle 1 10.214.176.197 9200/tcp
opensearch/2 active idle 2 10.214.176.222 9200/tcp
self-signed-certificates/0* active idle 3 10.214.176.93
Machine State Address Inst id Base AZ Message
0 started 10.214.176.187 juju-dd97d9-0 ubuntu@22.04 Running
1 started 10.214.176.197 juju-dd97d9-1 ubuntu@22.04 Running
2 started 10.214.176.222 juju-dd97d9-2 ubuntu@22.04 Running
3 started 10.214.176.93 juju-dd97d9-3 ubuntu@22.04 Running
Notice that the OpenSearch charm is now at revision 144.
Rollback a charm revision with a different workload version¶
If you roll back to a charm revision with a different workload version, the process will roll back the charm code and then make a best-effort attempt to roll back the workload, since OpenSearch does not support downgrades.
If the rollback between the versions is possible¶
In this case, both the charm code and the workload will be rolled back to the previous version. However, because rollback is a risky operation, rolling back the workload requires manual intervention. The charm will enter a blocked state and display a message instructing you to run the force-refresh-start action with check-compatibility=false to continue the best-effort workload rollback.
Model Controller Cloud/Region Version SLA Timestamp
testing lxd localhost/localhost 3.6.14 unsupported 08:36:09Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch blocked 3 opensearch 2 no Upgrading. Verify highest unit is healthy & run `resume-upgrade` action.
self-signed-certificates active 1 self-signed-certificates 1/stable 586 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0 active idle 1 10.149.40.7 9200/tcp OpenSearch 2.18.0 running; Snap rev 66; Charmed operator 1+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-...
opensearch/1 active idle 2 10.149.40.93 9200/tcp OpenSearch 2.18.0 running; Snap rev 66; Charmed operator 1+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-...
opensearch/2* blocked idle 3 10.149.40.126 9200/tcp Rollback incompatible. Run 'juju run <unit> force-refresh-start' with `check-compatibility` set to false to override ...
self-signed-certificates/0* active idle 0 10.149.40.252
Machine State Address Inst id Base AZ Message
0 started 10.149.40.252 juju-f44a9a-0 ubuntu@24.04 xof Running
1 started 10.149.40.7 juju-f44a9a-1 ubuntu@24.04 xof Running
2 started 10.149.40.93 juju-f44a9a-2 ubuntu@24.04 xof Running
3 started 10.149.40.126 juju-f44a9a-3 ubuntu@24.04 xof Running
If the rollback between the versions is not possible¶
In this case, the charm code will be rolled back, but the OpenSearch workload will remain on the newer version. The charm will enter a blocked state and display a message instructing you to either refresh to a charm revision with the same workload version or perform a backup and restore to a new deployment.
Model Controller Cloud/Region Version SLA Timestamp
testing lxd localhost/localhost 3.6.14 unsupported 08:03:52Z
App Version Status Scale Charm Channel Rev Exposed Message
opensearch blocked 3 opensearch 17 no Upgrading. Verify highest unit is healthy & run `resume-upgrade` action.
self-signed-certificates active 1 self-signed-certificates 1/stable 586 no
Unit Workload Agent Machine Public address Ports Message
opensearch/6* active idle 7 10.149.40.239 9200/tcp OpenSearch 2.17.0 running; Snap rev 58; Charmed operator 1+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-...
opensearch/7 active idle 8 10.149.40.64 9200/tcp OpenSearch 2.17.0 running; Snap rev 58; Charmed operator 1+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-dirty+530fe10bb-...
opensearch/8 blocked idle 9 10.149.40.31 9200/tcp Rollback unsupported. Refresh to a newer revision or consult the recovery documentation
self-signed-certificates/0* active idle 0 10.149.40.55
Machine State Address Inst id Base AZ Message
0 started 10.149.40.55 juju-0bfd52-0 ubuntu@24.04 xof Running
7 started 10.149.40.239 juju-0bfd52-7 ubuntu@24.04 xof Running
8 started 10.149.40.64 juju-0bfd52-8 ubuntu@24.04 xof Running
9 started 10.149.40.31 juju-0bfd52-9 ubuntu@24.04 xof Running
Check the cluster’s health¶
Once the charm is rolled back, it is important to check the cluster’s health to ensure it is healthy. OpenSearch’s upstream documentation suggests the following check:
GET "/_cluster/health?pretty"
The response should look similar to the following example:
{
"cluster_name" : "opensearch-7ngj",
"status" : "green",
"timed_out" : false,
"number_of_nodes" : 3,
"number_of_data_nodes" : 3,
"discovered_master" : true,
"discovered_cluster_manager" : true,
"active_primary_shards" : 5,
"active_shards" : 15,
"relocating_shards" : 0,
"initializing_shards" : 0,
"unassigned_shards" : 0,
"delayed_unassigned_shards" : 0,
"number_of_pending_tasks" : 0,
"number_of_in_flight_fetch" : 0,
"task_max_waiting_in_queue_millis" : 0,
"active_shards_percent_as_number" : 100.0
}
Recovering from a rollback¶
OpenSearch does not support downgrades.
Running juju refresh to a previous revision may cause OpenSearch to fail to start.
In that case, manual recovery is required.
Follow the steps in this section to restore the cluster to a healthy state.
For more information, please refer to the upstream OpenSearch documentation about rolling upgrades.
Check Juju status¶
First, check Juju model status:
juju status
The rolled back unit may appear stuck displaying the status Waiting for OpenSearch to start...:
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/stable 168 no
self-signed-certificates active 1 self-signed-certificates latest/stable 264 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0* active idle 0 10.45.114.156 9200/tcp
opensearch/1 active idle 1 10.45.114.208 9200/tcp
opensearch/2 waiting executing 2 10.45.114.147 9200/tcp Waiting for OpenSearch to start...
self-signed-certificates/0* active idle 3 10.45.114.124
Machine State Address Inst id Base AZ Message
0 started 10.45.114.156 juju-1fafd0-0 ubuntu@22.04 Running
1 started 10.45.114.208 juju-1fafd0-1 ubuntu@22.04 Running
2 started 10.45.114.147 juju-1fafd0-2 ubuntu@22.04 Running
3 started 10.45.114.124 juju-1fafd0-3 ubuntu@22.04 Running
Note the blocked unit; in this example, it is opensearch/2.
This unit will not recover automatically, and additional steps are required to replace it.
Check cluster health¶
Retrieve the cluster health:
curl -X GET "10.45.114.156:9200/_cluster/health?pretty"
If the cluster health is red, one or more primary shards cannot be allocated. Allocation explanations will identify any indices that exist only on the rolled back unit which has left the cluster. As the departed unit will not rejoin, these indices cannot be recovered and must be removed.
Identify the problematic index from the output of:
curl -X GET "10.45.114.156:9200/_cluster/allocation/explain?pretty"
For example, in the following output, index1 cannot be recovered as its current state is
unassigned with the reason NODE_LEFT:
{
"index": "index1",
"shard": 0,
"primary": true,
"current_state": "unassigned",
"unassigned_info": {
"reason": "NODE_LEFT",
"at": "2025-11-27T08:40:43.653Z",
"details": "node_left [NKDiDmZ7TOShHAW32rcleg]",
"last_allocation_status": "no_valid_shard_copy"
},
"can_allocate": "no_valid_shard_copy",
"allocate_explanation": "cannot allocate because a previous copy of the primary shard existed but can no longer be found on the nodes in the cluster",
"node_allocation_decisions": [
{
"node_id": "WxxsBtxITtab58q078TdEg",
"node_name": "opensearch-1.4c1",
"transport_address": "10.45.114.208:9300",
"node_attributes": {
"app_id": "39b6cdac-c195-466d-8537-e4a1f41fafd0/opensearch",
"shard_indexing_pressure_enabled": "true"
},
"node_decision": "no",
"store": {
"found": false
}
},
{
"node_id": "XnZt4LqwSTu79M7neGxkoQ",
"node_name": "opensearch-0.4c1",
"transport_address": "10.45.114.156:9300",
"node_attributes": {
"shard_indexing_pressure_enabled": "true",
"app_id": "39b6cdac-c195-466d-8537-e4a1f41fafd0/opensearch"
},
"node_decision": "no",
"store": {
"found": false
}
}
]
}
Delete the problematic index identified in the previous step:
Warning
If you do not have a snapshot containing this index, the data will be lost!
curl -X DELETE "10.45.114.156:9200/index1"
After deleting any orphaned indices, verify that the cluster returns to green or yellow health:
curl -X GET "10.45.114.156:9200/_cluster/health?pretty"
Set allocation settings¶
During the upgrade process, the routing allocation setting may be restricted to primaries.
Restore normal allocation by enabling all routing:
curl -X PUT "10.45.114.156:9200/_cluster/settings" -H 'Content-Type: application/json' -d'
{
"persistent": {
"cluster.routing.allocation.enable": "all"
}
}
'
Add a new unit¶
While optional, it is highly advisable to add a replacement unit to restore the application to its original scale:
juju add-unit opensearch -n 1
Remove rolled back unit¶
Remove the rolled back unit:
juju remove-unit opensearch/2
Where opensearch/2 is the name of the unit that was rolled back and blocked earlier.
Remove lock¶
If the replacement unit appears stuck displaying the status message
Requesting lock on operation: start, check if the departed unit still hold the lock:
GET /.charm_node_lock/_doc/0
Example response:
{
"_index": ".charm_node_lock",
"_id": "0",
"_version": 3,
"_seq_no": 28,
"_primary_term": 1,
"found": true,
"_source": {
"unit-name": "opensearch-2.4c1"
}
}
If the departed unit holds the lock, delete the lock document:
curl -X DELETE "10.45.114.156:9200/.charm_node_lock/_doc/0?refresh=true"
Wait for the replacement unit to start and join the cluster.
App Version Status Scale Charm Channel Rev Exposed Message
opensearch active 3 opensearch 2/stable 168 no
self-signed-certificates active 1 self-signed-certificates latest/stable 264 no
Unit Workload Agent Machine Public address Ports Message
opensearch/0* active idle 0 10.45.114.156 9200/tcp
opensearch/1 active idle 1 10.45.114.208 9200/tcp
opensearch/3 active idle 4 10.45.114.228 9200/tcp
self-signed-certificates/0* active idle 3 10.45.114.124
Machine State Address Inst id Base AZ Message
0 started 10.45.114.156 juju-1fafd0-0 ubuntu@22.04 Running
1 started 10.45.114.208 juju-1fafd0-1 ubuntu@22.04 Running
3 started 10.45.114.124 juju-1fafd0-3 ubuntu@22.04 Running
4 started 10.45.114.228 juju-1fafd0-4 ubuntu@22.04 Running
Verify new unit has joined the cluster¶
List the nodes in the current cluster:
curl -X GET "10.45.114.156:9200/_cat/nodes"
Confirm that the new node is present in the output, which will look similar to the following:
10.45.114.228 35 86 6 0.43 0.69 0.95 dim cluster_manager,data,ingest,ml - opensearch-3.4c1
10.45.114.156 32 86 11 0.43 0.69 0.95 dim cluster_manager,data,ingest,ml * opensearch-0.4c1
10.45.114.208 45 86 11 0.43 0.69 0.95 dim cluster_manager,data,ingest,ml - opensearch-1.4c1