Every day businesses are creating and aggregating data they need to protect from loss. The importance of data can help us determine if they should be backed up.
Important data may deleted accidentally, can become corrupt, a virus or even a natural disaster can put the data at risk, and that is why it is important to have a backup and disaster recovery plan. Without one, there is nothing to fall back on.
One of the project that I work on generates TSV files on a day-to-day basis. These files are used to dispatch bulk emails of recommended products as a part of the cross-channel marketing strategy and very important in determining the ROI. So these files are backed up to the Amazon S3 buckets located at different cloud center regions of Amazon Web Services(AWS).
1. Overview
A Spring Batch job generates the files and saves it on the root volume -of a Ubuntu EC2 instance - backed by the EBS. These files are then backed up to S3 using a Cron job.
The tool that is used for interacting with the S3 is s3cmd. This command line utility will synchronize data in the EBS with the S3.
2. Install s3cmd
Use apt-get package manager to install s3cmd on the Ubuntu server.
3. Configure s3cmd for managing data on S3
Get the access key, secret key from your AWS account prior to this step.
4. Create a S3 bucket
S3 bucket can be also created from the web console as well as AWS SDK.
5. Synchronize files
6. Write a shell script to automate the synchronization process
backup_sync_ec2_s3.sh
7. Edit crontab file
8. Schedule the script
Add cronjob to trigger automated backup everyday at 23:59 hrs
9. Conclusion
Mission-critical data should be backed up. Off-site file backup of the data from EC2 to S3 can be done using a command line tool named s3cmd. Automating that process using Cron job makes sure the data are synchronized on daily basis.
Check the s3cmd usage link to learn more about the available options and commands