RPM update

February 17, 2018
bash rpm deploy

Introduction

For a while my team at 3M had been completing rebirth deploys of a 200 VM distributed java system in AWS via a script that would spin up instances via the ec2 boto3 api and configure them via puppet. This particular dev team was on one-week sprints thus we deployed to prod every week. Deploying and configuring 200 VMs consecutively is a time-consuming process and this job was mine when I was the team newb. I got so tired of waiting around for ec2 completion, puppet registration and configuration that I decided to see if simply yum updateing all 200 VMs was scalable. Spoiler: with only one yum server in prod it wasn’t as scalable as I had hoped, but it did reduce our prod deployments by 4 hours every week and that of course translates to time and money saved!

RPM update

First global and non-global variables are set.

j=0                     # Set a counter to zero
ENV=$1                  # Which environment we are work on:
                        #    for di, qa, ct we do all at once
                        #    for pr, we only do 20 nodes at a time
LOGS=/var/tmp/example   # Directory to store log files in
SFX="inplace.log"       # Log file extension
ssh_user=`whoami`       # Used for pointing to proper creds file
prod=20                 # The number of 'pr' nodes we change at a time

Next a simple case statement chooses which environment to invoke the aws_dict function for.

  
env_check() {
case "$ENV" in
    di)
        aws_dict "some dev regex"
        ;;
    qa)
        aws_dict "some qa regex"
        ;;
    ct)
        aws_dict "some client test regex"
        ;;
    pr)
        aws_dict "some prod regex"
        ;;
    *)
        echo "Must specify di|qa|ct|pr"
        ;;
esac

The _awsdict function returns ec2 hostnames and ips based on a regex. The hostnames and ips get stored in arrays and these arrays are how the script knows which instances to yum update. The most important part of the function is the bash process substitution because it redirects our hostnames and ips into their corresponding arrays. Something interesting to note is that the read built-in can take more than one argument.


aws_dict() {
_regex="$1"
while read -r awk_column1 awk_column2; \
do hostnames_array+=( "$awk_column1" ) ips_array+=( "$awk_column2" ); \
done < <(export AWS_SHARED_CREDENTIALS_FILE=~${ssh_user}/.aws/credentials; \
aws --profile "$ENV" ec2 describe-instances --output text --query \
"Reservations[].Instances[].[Tags[?Key=='Name'].Value|[0],PrivateIpAddress]" \
| egrep "$_regex" | awk '{ print $1,$2 }')
}

The in_place_deploy function is the actual deployment step in the script. yum updateing against a single repo requires a yum hack: first we disable all repos, then turn a specified one on and the yum update. Whatever is the most recent rpm in that repo gets deployed. As the number of VMs to update increases it’s important to be able to modulate the load on our yum server. In our prod environment we did groups of 20 as this seemed to be a good compromise between wanting to deploy quicly while not overwhelming available bandwidth.

We deployed this application as an rpm because it integrates well with any distribution that uses yum, but the application itself is a massive war file which does a ton of stdout’ing. After in_place_deploy calls yum update it then redirects all of the initiating war file’s stdout to a log file on the box the script is being executed from. This is nice because when the deployment is complete we have 200 deployment log files each titled according to their corresponding VM’s hostname.

function in_place_deploy() {
CMD="sudo yum clean all && sudo yum -y update --disablerepo=*\
--enablerepo=example_engine_repo && sudo /opt/puppet/bin/puppet \
agent --test ; sleep 20 ; /etc/init.d/example-engine status"
#CMD="uptime"
for (( i = 0; i < ${#ips_array[*]}; i += n ))
    do
        for ip in ${ips_array[@]:i:n}
            do ssh "$ip" -A -n -oStrictHostKeyChecking=no -l "$ssh_user" $CMD \
              &> "${LOGS}"/"${ENV}"/"${hostnames_array["$j"]}"-${SFX} \
              & wait & ((j++))
            done
        status_check
    done
}

The status_check function reports if any file descriptors are in use within our log directory which allows me to know at a glance whether or not the entire deployment is complete.

function status_check() {
while :
    do
      if ! lsof +D $LOGS/$ENV | grep -q "example-$ENV"
          then
            break
      fi
      echo "Something is still working..."
      sleep 60
    done
echo "Update complete!"

I’ve included the core parts of the script here, but there are others. Check the whole thing out and its other iterations on my github.

Considerations

As mentioned in the in_place_deploy function explanation we set a counter variable that actually prevents overloading our yum server by updating 20 VMs at a time. This is because the first iteration of this script simply attempted 200 calls to our single prod yum server for a 2GB rpm(!) and this slowed the deployment.

Increasing the number of yum servers is not best practice. A better approach would be to have yum deploy the rpm to numerous S3 buckets which would distribute the load of large groups of VMs making the same API call.

comments powered by Disqus