Symptoms:

Backups Fail with the following error message seen in the Message Logs of the backup:

"bsock.c:184 Unable to connect to Storage daemon on 192.168.1.15:9103. ERR=Connection refused"

and/or

You see the following notices on your Revinetix appliance:

"primary database is inaccessible"

"insufficient storage space on /raid"

"storage space critically low /raid"

"storage space critically low /root full"

"storage space critically low /var/log/full"


There does not seem to be many/any GB free on the bottom left RAID Usage bar on the appliance.

Note: if you see "storage space critically low /raid/:catalogs:" OR if you are seeing the "primary database is inaccessible" message and you have a lot of space on the RAID Usage bar, this is a different error message involving space on the database and will require additional assistance.  Please Contact Revinetix Support for additional assistance.



Resolution:

These messages typically indicate that the space on the RAID of the appliance is full.  The appliance will not overwrite the oldest jobs when the appliance runs out of space, rather the newest jobs will fail.


Below are the necessary steps to clear space on the RAID in the order they should be attempted:


Delete Erred and Cancelled Jobs

Erred jobs may have hidden space taken up even if there are 0 bytes showing as saved.

It is always recommended to delete erred jobs unless you are troubleshooting an issue and need them for diagnostics.  You can have the appliance automatically delete erred jobs by going to Jobs > Settings > Enable Automatic Deletion of Erred Jobs.  Note: You will still be able to see the erred jobs for the first 24-48 hours by going to Server > Recent Jobs.

How to Delete Erred Jobs:

Go to Jobs > History

Select Filters (Between Actions and Refresh at the top of the screen) > Show Filters

A new "Filters" Toolbar should appear below the Filters option but above the Jobs

Select the Status dropdown > Uncheck Successful and Warnings, leaving only Erred and Cancelled

Select the Upper Left Checkbox to "Select All" Jobs

Go to Actions > Delete > Confirm you wish to delete the erred and cancelled jobs.

Delete Unreferenced Data

The Unreferenced Data sub tab lists data that should have been deleted after you deleted a job, but the deletion left behind files. These unreferenced data files take up disk space and should be deleted, especially if you suspect that you are running out of disk space. Or, you can attempt to re-import unreferenced data files that may have been misfiled or lost in the system.

How to Delete Unreferenced Data:

Go to Jobs > Unreferenced Data

Scan for Unreferenced Data

Select all

Delete Selected Volumes > Confirm

Manually Delete Jobs


If Erred/Cancelled Jobs and Unreferenced Data have been cleared but you need additional space cleared on the RAID, you will need to delete older or less critical jobs.

You can use the same filter option as mentioned above with Erred and Cancelled jobs.  Select by client or set a date range to make the job selection easier

Select the check box to the left of the associated job.  Once you have selected all the jobs you wish to delete, you can either right-click on one of them to choose 'Delete' or you can go to the Actions menu above the tabs (far upper left corner) and choose 'Delete'

Or, you can manually select jobs one at a time

Go to Jobs > History and right-click on a job.  A menu should come up with 'Delete' as an option.

To avoid any possible data loss, please make sure that you are keeping, at the very least, your most recent Full and any subsequent backups on your RAID.

Run Garbage Collection


After attempting the above options, it is always a good idea to run Garbage Collection.  This will ensure that all data is cleared quickly and efficiently.

To manually run Garbage Collection: Go to Server > Storage > Start Garbage Collection

Garbage Collection can take a long time to run, especially if you have deleted a lot of data.

For clients using deduplication, the UCAR system runs a garbage collection process every day to find and purge any garbage. Some ways that data can become unreferenced garbage are when clients are deleted without their jobs being purged, or when old jobs were not removed completely.  It is recommended to run garbage collection after deleting jobs to be sure the data is cleared completely.

Typically, Garbage Collection occurs automatically at a scheduled time

Starting with 3.1 Rev OS, garbage collection, will not occur if jobs are currently being deduped. The Progress bar for the Garbage Collection process will say deferred in this case.

The garbage collection will be deferred for up to 12 hours before it gives up, and will be retried at its regular time. One exception is if the system is running low on space, at which point the Garbage Collection will proceed whether there are jobs deduping or not.

If you have Block level deduplication enabled there may be some space taken up by the block store that is not actually being used. Block deduplication has its own garbage collection process. To run this process go to Server > Storage then scroll down to the section about block deduplication. There should be a button labeled "Reclaim Storage"

Automatically Recycle Jobs

For long term management of space on the RAID, it is a good idea to set recycling schedules.  This will allow the appliance to automatically delete jobs once they reach an expiration date.

Note: All customers using byte-level replication need to use a schedule and retention policy that ensures you have sufficient time for a second full backup to completely replicate over to the secondary before purging the oldest or the original full backup.  It is recommended to have a minimum of two full backups on the appliance at a given time, preferred is three.

Best practices for schedules and recycling will depend on many things, like the size of the backups and your company's policies for data backup. It may be a good idea to start on a weekly schedule and see how the backups run from there. If you find that you are running out of space quickly you may need to move to a monthly schedule or another custom schedule that you create (Unless you are backing up Exchange, in which case you will need to stay on a Weekly schedule).

Recycling can be set to echo the backup schedules

In Clients > Edit (after selecting a Client from the list on the left) you will be able to edit various aspects of your backup

Once selected a schedule you can setup Recycling via the 'Job Recycling' section. Make sure this fits with your chosen schedule's details (which can be found or created in Clients > Schedules)

If you chose the weekly schedule, set all of the job recycling (Fulls, Incrementals, Differentials) to recycle weekly if you want the job immediately cleared off of the RAID, or set them to recycle after a longer period of time if you need them to stay around

If you chose the monthly schedule and just wanted space cleared a.s.a.p. you could have Fulls recycle monthly with Incrementals and Differentials recycling weekly

Go to Jobs > Settings to make sure that 'Enable Automatic Job Management' is checked

You can also choose to check 'Enable Automatic Deletion of Erred Jobs' (as mentioned in more detail above)

Preserve Single Job Set:

It is highly recommended to enable Preserve Single Job Set.  Go to Jobs > Settings to set 'Job Retention Policy' to "Preserve Single Job Set'.  This will tell the appliance not to recycle a job until there is a viable replacement.  This option is recommended to be sure the most recent full and any subsequent differentials/incrementals remain on the appliance and are not automatically deleted.  

Keep in mind, however, that this setting will affect recycling/retention settings until new full is available.

For example, If you have a client with the following settings:

Backup Schedule: Monthly (Fulls on the first Sunday, Differentials on the second/fifth Sunday, Incrementals every Monday-Saturday)

Recycling/Retention:

Full 5 weeks

Diff 1 week

Inc 5 days

Preserve Single Job Set ENABLED

This is what you will see:

The full and incrementals will run for the first week.  

Expired incrementals will NOT be deleted because of the 'preserve single job set' setting.

The differential and incrementals will run for the first week.  

Expired incrementals from the FIRST week will delete because the differential has the needed data collected from the time the full backup was taken.  

Expired incrementals for the second week will NOT be deleted because of the 'preserve single job set' setting.

Incrementals will run for weeks three and four.  

Both the expired differential and the expired incrementals from weeks two, three and four NOT be deleted because of the 'preserve single job set' setting.

The full will run the first sunday of the following month.  

At this point all expired differentials and incrementals from the previous month will be deleted (once they hit their retention/recycling setting).

The more time between full backups on a client, the greater the impact of the 'preserve single job set' feature.  

If you find yourself running out of space because differentials and incrementals are not automatically deleted when their retention settings say they should be, check to see if you have 'preserve single job set' enabled.  If this is the case, it is likely that the appliance is working as designed and the backup schedules and retention settings need to be modified.  Try a Weekly schedule rather than a Monthly schedule and set retention settings accordingly.  If you need one month's worth of backups available for restoration but don't seem to have the capacity on the RAID, remember you can also make use of the archiving feature to save the backup jobs for a month off the RAID.

Other Space Considerations:

Delayed calculation

The storage space calculations, used in the bottom bar , are automatically performed every 12 HOURS and cannot be forced.

If space still does not add up:

The database is used extensively for the accounting calculations. If the database has errors, the accounting will likely be incorrect. Please Contact Revinetix Support for further assistance.

If replication has been suspended due to lack of space on the RAID and does not automatically start up again please navigate to Replication > Status on the primary, click on Actions and choose "Reconcile Seconday".

Note: The trouble is that space usage is difficult to pin down if there are other processes running in the background (de-duping, importing, or new jobs, for instance). Even if all jobs are deleted from the raid the appliance then has to check for unreferenced data, recalculate space and also run the garbage collector in order to get an accurate reading (and all of this before the next set of jobs runs). If any jobs were in the process of de-duplicating when they were deleted it can hold up this calculation process (de-duplication of backups actually takes almost double the space of the backup while in progress, but can save you space in the long run, after it's done). If garbage collection runs while jobs are completing the importing or de-duping those jobs are not part of the space recalculation and can offset the numbers. If large amounts of data is all deleted at once the appliance has to check itself to make sure that there aren't bits and pieces of these jobs left behind (similar to, but not exactly like, when you defrag any other computer system, only the appliance is trying to do this while simultaneously receiving new data). Finally, the only part of the space usage that is set to re-calculate more frequently than once per day is the free space, but any of the above scenarios can throw that off.