Stop your business going backwards

image1

I just had the pleasure of listening to Kevin Rowland from Ezibuy speak at the Retail NZ Summit in Auckland. My favourite quote from Kevin’s talk was, “Retail is like walking up a down escalator. If you’re standing still, you’re actually going backwards.”

This is true of most business and industry. If you’re not following Ezibuy’s lead and seeing how you can innovate with data and intelligent contact centres like Amazon Connect, you’re likely to be going backwards simply by sticking with the status quo.

It was interesting to see early incarnations of retail, before supermarkets when the owner-operators knew you personally and could make recommendations based on what they knew about you from your regular visits.

With technologies like Amazon Connect, and it’s supporting AWS CX technologies, this level of genuine personalisation is possible in the modern era. One example is checking your order system based on caller id, to ascertain whether the customer is phoning about the status of an order.

The value-add for your call centre staff, is increased call quality by not having to deal with mundane, monotonous calls that a clever Lexbot can deal with.

It’s a win for your business and your team, but most importantly, a win for your customers.

Find out how Consegna can integrate into your existing CRM and support systems with Amazon Connect. connect@consegna.cloud 

 

5 reasons appreciation ensures success

A person who feels appreciated will always do more than is expected

This week I put in an unusual sixty hours alongside some of my team members. We all had a number of high pressured, time constrained deliverables: 

A critical workload to migrate during one of our client’s busiest sales periods of the year. An important presentation, to wrap up some world first specialised training the Consegna team have been privileged to receive with AWS. And to top it off, a trip South from Auckland to meet with another client, after  the 2am change the night before – with a “sleep-in” until 5am to catch the flight!

You often hear people on public transport griping about their job and how busy they are. They express how put out they are, how much of a personal burden it is and how much their job really sucks. 

Me? I absolutely loved every minute of it!

How could this be possible you ask? Well here are the five reasons appreciation ensures everyone’s success.

Acknowledgement of effort

Friday afternoon we had to present to the country manager for AWS in NZ, and some of the top ANZ AWS Professional Services team.

Before we got started, the AWS country manager personally thanked us for taking part in the programme we’d been training in, and the massive effort we’d all put in on top of our regular client deliverables. I really appreciate the acknowledgement of effort, it underlined his leadership qualities, and he achieved all that simply with a heart-felt “thanks”.

The internal acknowledgement had well and truly been there, but to get this extra acknowledgement was greatly appreciated.

Having people’s trust

My team and I completed two 2am back-to-back changes after two busy days working well into the evening. We were at pains to make the migration to AWS services as seamless as possible. The pinnacle of the change – nobody even knew the site had switched services. 

Our client was ecstatic, and the careful approach that we’d used paid off.

Changes of this nature in the middle of peak business activity are extremely challenging. It’s only ever possible when you have the absolute trust of your team and your clients. When problems occur, you can’t successfully solve them if people don’t trust in your skills, expertise and ability to solve them. 

We were ultimately successful because even in the face of difficult circumstances, we had our client’s trust and we did everything in our power to validate that trust. Trust is a form of appreciation, and without it, you just can’t be successful.

Making a difference

Earlier this year AWS helped enable their partners and clients with the “Well Architected Review” (WAR) tool. This allows clients to work with AWS partners to review workloads and identify whether or not they are well architected to AWS best practice, and remediate in areas where they’re not.

Among the high pressure deliverables the team had this week, was a WAR with one of our key clients which I lead. The pivotal moment of that exercise arose when, towards the end of it we could all see that there was a lot of work to be done. The client’s team were in a challenging spot, but then this magic moment took place. 

We articulated the following;

“What we have here is a great opportunity to get you to a much better place than you are now. There’s a number of small easy changes we can make, and they’re going to get you a long way towards being well architected. Not only that, operationally your lives are going to be a lot better with these things improved.”

The uplift of demeanour in the room was palpable, and they knew that what was being suggested was achievable. Appreciation doesn’t always come in the form of words. It also comes from the expressions on people’s faces when you are able to deliver workable solutions. It provides a deep sense of satisfaction in what you’re doing, because of the difference it’s making to others. Enabling us to deliver better outcomes for our clients.

Team and Management Support

When you put in big hours and deliver to tight schedules, having the backing of your leadership  and entire team is critical to your success. Knowing that they’re there when you need them is one of the subtler, but more powerful forms of appreciation. It’s also the little things, like your MD buying you a triple shot coffee because he knows you need it!

At Consegna we have a stellar team of people. People who are there even on their day off to support you, just to make sure things go smoothly. That’s true commitment.

It’s a two way street

This week was unusual. We have no interest in our team burning out from long hours, and it was me having to convince my MD and COO that I was OK. If anything I was buzzing and thriving on one of the busiest weeks I’ve had this year. 

Some of the deliverables I’m not sure how I managed. Like putting together a slide deck after a 10 hour day, and still having the energy to present it the next day. Delivery is what we care about at Consegna, and that’s what we’d all been so busy with. Delivering great outcomes for our clients.

Appreciation is quite simple at the end of the day. We get well looked after at Consegna by a committed senior leadership team who have the team and our clients at the forefront of their minds. It’s a necessary two way street. You step up when you need to, and get looked after from end to end.

Personally I don’t see value in time-in-lieu or trying to account for all the time I’ve put in this week. I do it because I care and I love what I do! We get looked after well with team dinners and events, participation and inclusion in specialist AWS training. With fantastic office locations, facilities, and flexible working conditions. We also get all expense paid trips to key events like the AWS Summits and APN Ambassador events. These allow us to expand our knowledge and further hone our skills for the business.

When appreciation flows both ways success is inevitable. With delivery a focal point at Consegna, appreciation and trust in our team is what makes us a success.  

 

Use Lambda@Edge to handle complex redirect rules with CloudFront

Problem

Most mature CDNs on the market today offer the capability to define URL forwarding / redirects using path based rules that get executed on the edge, minimising the wait time for users to be sent to their final destination.

CloudFront, Amazon Web Services’ CDN offering, provides out-of-the box support for redirection from HTTP to HTTPS and will cache 3xx responses from its origins, but it doesn’t allow you to configure path based redirects. In the past, this meant we had to configure our redirects close to our origins in specific AWS regions which had an impact on how fast we could serve our users content.

Solution

Luckily, AWS has anticipated this as a requirement for its users and provides other services in edge locations that can compliment CloudFront to enable this functionality.

AWS Lambda@Edge is exactly that, a lambda function that runs on the edge instead of in a particular region. AWS has the most comprehensive Global Edge Network with, at the time of writing, 169 edge locations around the world. With Lambda@Edge, your lambda function runs in a location that is geographically closest to the user making the request.

You define, write and deploy them exactly the same way as normal lambdas, with an extra step to associate them with a CloudFront distribution which then copies them to the edge locations where they’ll be executed.

Lambda@Edge can intercept requests at different stages of the request life-cycle:

https://docs.aws.amazon.com/lambda/latest/dg/lambda-edge.html

For our use case, we want to intercept the viewer request and redirect the user based on a set of path based rules.

In the following section there is instructions on how to deploy implement redirects at the edge using the Serverless Application Model, CloudFront and Lambda@Edge.

How to

Assumptions

This guide is written with the assumption that you have the following things set up:

Since we’ll be using the Serverless Application Model to define and deploy our lambda, we’ll need to set up an S3 bucket for sam package, so we have a prerequisites CloudFormation template.

Note: everything in this guide is deployed into us-east-1. I have included the region explicitly in the CLI commands, but you can use your AWS CLI config if you want (or any of the other valid ways to define region).

1) Create the following file:

lambda-edge.prerequisites.yaml

AWSTemplateFormatVersion: '2010-09-09'

Resources:
  RedirectLambdaBucket:
    Type: AWS::S3::Bucket

Outputs:
  RedirectLambdaBucketName:
    Description: Redirect lambda package S3 bucket name
    Value: !Ref RedirectLambdaBucket

We define the bucket name as an output so we can refer to it later.

2) Deploy the prerequisites CloudFormation stack with:

$ aws --region us-east-1 cloudformation create-stack --stack-name redirect-lambda-prerequisites --template-body file://`pwd`/lambda-edge-prerequisites.yaml

This should give you an S3 bucket we can point sam deploy to, let’s save it into an environment variable so it’s easy to use in future commands (you can also just get this from the AWS Console):

3) Run the following command:

$ export BUCKET_NAME=$(aws --region us-east-1 cloudformation describe-stacks --stack-name redirect-lambda-prerequisites --query "Stacks[0].Outputs[?OutputKey=='RedirectLambdaBucketName'].OutputValue" --output text)

Now we’ve got our bucket name ready to use with $BUCKET_NAME, we’re ready to start defining our lambda using the Serverless Application Model.

The first thing we need to define is a lambda execution role. This is the role that our edge lambda will assume when it gets executed.

4) Create the following file:

lambda-edge.yaml

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Full stack to demo Lambda@Edge for CloudFront redirects

Parameters:
  RedirectLambdaName:
    Type: String
    Default: redirect-lambda

Resources:
  RedirectLambdaFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'lambda.amazonaws.com'
                - 'edgelambda.amazonaws.com'
            Action:
              - 'sts:AssumeRole'
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'

Notice that we allow both lambda.amazonaws.com and edgelambda.amazonaws.com to assume this role, and we grant the role the AWSLambdaBasicExecutionRole managed policy, which grants it privileges to publish its logs to CloudWatch.

Next, we need to define our actual lambda function using the Serverless Application Model.

5) Add the following in the Resources: section of lambda-edge.yaml:

lambda-edge.yaml

  RedirectLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      FunctionName: !Ref RedirectLambdaName
      Handler: RedirectLambda.handler
      Role: !GetAtt RedirectLambdaFunctionRole.Arn 
      Runtime: nodejs10.x
      AutoPublishAlias: live

Note: we define AutoPublishAlias: live here which tells SAM to publish both an alias and a version of the lambda and link the two. CloudFront requires a specific version of the lambda and doesn’t allow us to use $LATEST.

We also define CodeUri: lambdas/ which tells SAM where it should look for the Node.js that will be the brains of the lambda itself. This doesn’t exist yet, so we’d better create it:

6) Make a new directory called lambdas:

$ mkdir lambdas

7) Inside that directory, create the following file:

lambdas/RedirectLambda.js

'use strict';

exports.handler = async (event) => {
    console.log('Event: ', JSON.stringify(event, null, 2));
    let request = event.Records[0].cf.request;

    const redirects = {
        '/path-1':    'https://consegna.cloud/',
        '/path-2':    'https://www.amazon.com/',
    };

    if (redirects[request.uri]) {
        return {
            status: '302',
            statusDescription: 'Found',
            headers: {
                'location': [{ value: redirects[request.uri] }]
            }
        };
    }
    return request;
};

The key parts of this lambda are:

a) we can inspect the viewer request as it gets passed in via the event context,
b) we can return a 302 redirect if the request path meets some criteria we set, and
c) we can return the request as-is if it doesn’t meet our redirect criteria.

You can make the redirect rules as simple or as complex as you like.

You may have noticed we hard-code our redirect rules in our lambda, we do this for a couple of reasons but you may decide you’d rather keep your rules somewhere else like DynamoDB or S3. The three main reasons we have our redirect rules directly in the lambda are:

a) the quicker we can inspect the request and return to the user the better, having to hit DynamoDB or S3 will slow us down
b) because this lambda is executed on every request, there will be cost implications to hit DynamoDB or S3 every time
c) defining our redirects via code means we can have robust peer reviews using things like GitHub’s pull requests

Because this is a Node.js lambda, SAM requires us to define a package.json file, so we can just define a vanilla one:

8) Create the file package.json:

lambdas/package.json

{
  "name": "lambda-redirect",
  "version": "1.0.1",
  "description": "Redirect lambda using Lambda@Edge and CloudFront",
  "author": "Chris McKinnel",
  "license": "MIT"
}

The last piece of the puzzle is to define our CloudFront distribution and hook up the lambda to it.

9) Add the following to your lambda-edge.yaml:

lambda-edge.yaml

  CloudFront: 
    Type: AWS::CloudFront::Distribution 
    Properties: 
      DistributionConfig: 
        DefaultCacheBehavior: 
          Compress: true 
          ForwardedValues: 
            QueryString: true 
          TargetOriginId: google-origin
          ViewerProtocolPolicy: redirect-to-https 
          DefaultTTL: 0 
          MaxTTL: 0 
          MinTTL: 0 
          LambdaFunctionAssociations:
            - EventType: viewer-request
              LambdaFunctionARN: !Ref RedirectLambdaFunction.Version
        Enabled: true 
        HttpVersion: http2 
        PriceClass: PriceClass_All 
        Origins: 
          - DomainName: www.google.com
            Id: google-origin
            CustomOriginConfig: 
              OriginProtocolPolicy: https-only 

In this CloudFront definition, we define Google as an origin so we can define a default cache behaviour that attaches our lambda to the viewer-request. Notice that when we associate the lambda function to our CloudFront behaviour we refer to a specific lambda version.

SAM / CloudFormation template

Your SAM template should look like the following:

lambda-edge.yaml

AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Description: Full stack to demo Lambda@Edge for CloudFront redirects

Parameters:
  RedirectLambdaName:
    Type: String
    Default: redirect-lambda

Resources:
  RedirectLambdaFunctionRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
          - Effect: Allow
            Principal:
              Service:
                - 'lambda.amazonaws.com'
                - 'edgelambda.amazonaws.com'
            Action:
              - 'sts:AssumeRole'
      ManagedPolicyArns:
        - 'arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole'

  RedirectLambdaFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: lambdas/
      FunctionName: !Ref RedirectLambdaName
      Handler: RedirectLambda.handler
      Role: !GetAtt RedirectLambdaFunctionRole.Arn 
      Runtime: nodejs10.x
      AutoPublishAlias: live

  CloudFront: 
    Type: AWS::CloudFront::Distribution 
    Properties: 
      DistributionConfig: 
        DefaultCacheBehavior: 
          Compress: true 
          ForwardedValues: 
            QueryString: true 
          TargetOriginId: google-origin
          ViewerProtocolPolicy: redirect-to-https 
          DefaultTTL: 0 
          MaxTTL: 0 
          MinTTL: 0 
          LambdaFunctionAssociations:
            - EventType: viewer-request
              LambdaFunctionARN: !Ref RedirectLambdaFunction.Version
        Enabled: true 
        HttpVersion: http2 
        PriceClass: PriceClass_All 
        Origins: 
          - DomainName: www.google.com
            Id: google-origin
            CustomOriginConfig: 
              OriginProtocolPolicy: https-only 

And your directory structure should look like:

├── lambda-edge-prerequisites.yaml
├── lambda-edge.yaml
├── lambdas
│   ├── RedirectLambda.js
│   └── package.json
└── packaged
    └── lambda-edge.yaml

Now we’ve got everything defined, we need to package it and deploy it. AWS SAM makes this easy.

10) First, create a new directory called package:

$ mkdir package

11) Using our $BUCKET_NAME variable from earlier, we can now run:

$ sam package --template-file lambda-edge.yaml --s3-bucket $BUCKET_NAME > packaged/lambda-edge.yaml

The AWS SAM CLI takes the local SAM template and parses it into a format that CloudFormation understands. After running this command, you should have a directory structure like this:

├── .aws-sam
│   └── build
│       ├── RedirectLambda
│       │ ├── RedirectLambda.js
│       │ └── package.json
│       └── template.yaml
├── lambda-edge-prerequisites.yaml
├── lambda-edge.yaml
├── lambdas
│   ├── RedirectLambda.js
│   └── package.json
└── packaged
    └── lambda-edge.yaml 

Notice the new .aws-sam directory – this contains your lambda code and a copy of your SAM template. You can use AWS SAM CLI to run your lambda locally, however this is out of the scope of this guide. Also notice the new file under the packaged directory – this contains direct references to your S3 bucket, and it’s what we’ll use to deploy the template to AWS.

You can find the full demo, downloadable in zip format, here: lambda-edge.zip

Finally we’re ready to deploy our template:

12) Deploy your template by running:

$ sam deploy --region us-east-1 --template-file packaged/lambda-edge.yaml --stack-name lambda-redirect --capabilities CAPABILITY_IAM

Note the –capabilities CAPABILITY_IAM, this tells CloudFormation that we acknowledge that this stack may create IAM resources that may grant privileges in the AWS account. We need this because we’re creating an IAM execution role for the lambda.

This should give you a CloudFormation stack with a lambda deployed on the edge that is configured with a couple of redirects.

When you hit your distribution domain name and append a redirect path (/path-2/ – look for this in the lambda code), you should get redirected:

Summary

AWS gives you building blocks that you can use together to build complete solutions, often these solutions are much more powerful than what’s available out-of-the-box in the market. Consegna has a wealth of experience designing and building solutions for their clients, helping them accelerate their adoption of the cloud.

AWS SAM local – debugging Lambda and API Gateway with breakpoints demo

Overview

This new serverless world is great, but if you dive into it too fast – sometimes you end up getting caught up trying to get it all working, and forget to focus on having your local development environment running efficiently. This often costs developers time, and as a consequence it also costs the business money.

One of the things I see fairly often, is developers adopting AWS Serverless technologies because they know that’s what they should be doing (and everyone else is doing it), but they end up sacrificing their local development flows to do so – and the one that’s most obvious is running lambdas locally with breakpoints.

This post covers how to get local debugging working using breakpoints and an IDE from a fresh AWS SAM project using the Python3.6 runtime.

I’m using the Windows Subsystem for Linux and Docker on Windows.

Prerequisites

Video Demo

There is a high level step-by-step below, but the video contains exact steps and a demo of this working using WSL and Docker for Windows.

Step by step

Assuming you’ve got the prerequisites above, the process of getting a new project set up and hooked up to your IDE is relatively straight forward.

1. Initialise a new python runtime project with:

$ sam init --runtime python3.6

2. Test we can run our API Gateway mock, backed by our lambda locally using:

$ sam local start-api

3. Hit our app in a browser at:

http://127.0.0.1/hello

Screen Shot 2019-05-29 at 2.02.43 PM

4. Add the debugging library to requirements.txt

hello_world/requirements.txt
requests==2.20.1
ptvsd==4.2.10

5. Import the debug library and have it listen on a debug port

hello_world/app.py

import ptvsd

ptvsd.enable_attach(address=(‘0.0.0.0’, 5890), redirect_output=True)
ptvsd.wait_for_attach()

6. Build the changes to the app in a container using:

$ sam build --use-container

7. Set a breakpoint in your IDE

8. Configure the debugger in your IDE (Visual Code uses launch.json)

{
   "version": "0.2.0",
   "configurations": [
       {
           "name": "SAM CLI Python Hello World",
           "type": "python",
           "request": "attach",
           "port": 5890,
           "host": "127.0.0.1",
           "pathMappings": [
               {
                   "localRoot": "${workspaceFolder}/hello_world",
                   "remoteRoot": "/var/task"
               }
           ]
       }
   ]
}

9. Start your app and expose the debug port using:

$ sam local start-api --debug-port 5890

10. Important: hit your endpoint in the browser

This will fire up a docker container that has the debug port exposed. If you attempt to start the debugger in your IDE before doing this, you will get a connection refused error.

11. Start the debugger in your IDE:

This should drop you into an interactive debugging session in your IDE. Great!

Summary

While adopting new technologies can be challenging and fun, it’s important that you keep your local development as efficient as possible so you can spend your time on what you do best: developing.

Consegna helps its partners transition from traditional development processes, to cloud development processes with minimal disruption to workflow. By demonstrating to developers that their day-to-day lives won’t change as much as they think, we find that widespread adoption and enthusiasm for the cloud is the norm, not the exception for developers in our customer engagements.

What does the CD in CICD actually mean?

It’s occurred to me fairly recently that there’s sufficient confusion about what the CD in CICD stands for, to warrant some simple explanation. I’m not even certain that people generally understand the CI part either. I’ve noticed on a few occasions developers tend to say, “A CICD pipeline is an automated way of delivering code into production.” I feel this is often interpreted as, “you commit code over here in your repository, and it automagically pops up in production over there.”

“What on earth could possibly go wrong with that?” the system operations team might ask sarcastically, turning distinctly green then pale at the thought of developers having ultimate control over production releases. I’ve also noticed non-developers ask, “Is it Continuous Delivery, or Continuous Deployment?” Is there even a difference? It seems a number of people interchange Delivery and Deployment without really understanding what each of them actually is. Isn’t this just typical developer double-speak for exactly the same thing?

To provide some historical context to this explanation, it’s useful to understand software development life cycles before Virtualisation and Cloud. Right up until the mid-2000s, software feature releases were typically very slow. Products might have gone four years before they got an update, and the savvy never installed a “dot-one” (.1) release, let alone a dot zero (.0). Frequently they were buggy and unreliable. It often wasn’t until a “one- dot-two” (1.2) release that people started getting any confidence. Even then, that only ensured catastrophic bugs were eliminated. Other glitchy behaviours often still existed but didn’t contribute to a total loss of work.
“You do back up your work every half hour right?” was the catch cry long before Cloud, autosave and versioning were as widespread as it is today.

A significant change that virtualisation helped bring about, was the ability to more cheaply run a non-production staging environment. A place to test out new changes before releasing them into production. Cloud via infrastructure as code makes non-production environments even faster, easier and cheaper to provision for testing purposes. The key to an effective staging environment is that it’s as Production-like as possible. This helps avoid embarrassing and costly “roll-backs” when code changes don’t behave in production as expected because there was too much variation between prod and non-prod.

Running in parallel to these infrastructure changes was the development and rapid adoption of software version control systems. In the bad old days, files were versioned by either commenting out old lines of code and introducing new lines. Alternatively, old files on production servers were renamed, and new files introduced in their place. It was less than ideal and alarmingly widespread. “When was the last backup done?” wasn’t something you wanted to hear in a development team. It often meant somebody had overwritten something they shouldn’t have.

Version control in the form of SVN, then later Git and other alternatives allowed a new copy of a code file to be added to a repository, and the old version kept completely intact. What’s more, the developer could comment during the commit, what had been changed.
This lead to the practice of Continuous Integration (CI), where developers could collaborate together more rapidly by sharing small code changes via the version control repository, and by doing so, minimise the impact each code change had on others. Everyone was effectively working on the same code base, rather than having separate copies that diverged more and more widely, the longer each developer worked on their private copy of the code.

This brings us then to Continuous Delivery which is the automatic build and test of committed code changes with the aim of having production ready features available for release. Getting to Continuous Delivery after the successful implementation of Continuous Integration is relatively straight forward using AWS Services. By using Code Pipeline to automate the test and build of code commits, most of the tooling required is readily available.
AWS services like Elastic Beanstalk make it incredibly simple to replicate production application stacks into non-production environments for testing. AWS OpsWorks and CloudFormation can greatly simplify the reproduction of more complex application stacks for production-like staging.

Many organisations get to Continuous Delivery and don’t adopt Continuous Deployment. They either use a manual authorisation step to deploy changes into production or a semi-automated delivery approach of code into production. Continuous Deployment then is the automatic deployment of production-ready code into production with no manual interventions. If changes fail the build and test processes, they are rejected and sent back to the development team for revision. If however, all changes pass the test, they are automatically deployed into production, and this is continuous deployment.

The fundamental key to all this working well is small frequent changes. The historic issue with large complex changes over an extended period of time was that the root cause of any particular issue was extremely difficult to pin down. This made people reluctant to release changes unless it was absolutely necessary. With the collaboration made possible by Continuous Integration, it ensures everyone is working on a single code base. This prevents accumulative errors common when everyone is working in isolation on big changes.

So there is an important distinction between Continuous Delivery and Continuous Deployment. The latter can be arrived at in an incremental manner after successfully adopting CI, then getting continuous delivery and testing to a robust enough point that well vetted, small feature changes can be continuously deployed into production.

Consegna have significant experience in helping organisations adopt CICD successfully. If you’d like to find out more information, email hello@consegna.cloud.

Cloud Migration War Stories: 10 Lessons learnt from the Lift-and-Shift migration of 100s of client servers from Traditional Data Centres into AWS.

Experience is a hard teacher because you get the test first, and the lesson afterwards.

I’ve always felt the best lessons are the ones learnt yourself, but to be honest, sometimes I would be more than happy to learn some lessons from others first. I hope the following can help you before you embark on your lift-and-shift migration journey.

Beware the incumbent

“Ok, so he won’t shake my hand or even look me in the eye, oh no this is not a good sign”. These were my initial observations when I first met with the representative of one of our clients Managed Service Provider (MSP). Little did I know how challenging, yet important this relationship was to become.

This is how I saw it. All of sudden after years of this MSP giving your client pretty average service they see you, this threat on their radar. Sometimes the current MSP is also getting mixed messages from the client. What’s wrong? Why the change? What does it mean for them?

I found it best to get the existing MSP on side early. If it’s an exit, then an exit strategy is needed between the client and the MSP. The best results happen when the MSP is engaged and ideally a Project Manager is put in place to assist the client with that exit strategy.

Most importantly, tell your MSP to heed the wise words of Stephen Orban.  “Stop fighting gravity. The cloud is here, the benefits to your clients are transformational, and these companies need your help to take full advantage of what the cloud offers them. Eventually, if you don’t help them, they’ll find someone who will.”

Partner with your client

“Do cloud with your client, not to them”. Your client is going to have a certain way they work, and are comfortable with. Your client will also have a number of Subject Matter Experts (SMEs) and in order to bring these SMEs on the journey also, having someone from your team on-site and full time paired-up next to the SME to learn from them can be invaluable.

There will be things they know that you don’t. A lot actually. I found it best to get your client involved and more importantly their input and buy-in. The outcome will be much better, as will your ability to overcome challenges when they come up.

Lay a good foundation

We spent a significant amount of time working with our client to understand what we were in for. We created an extensive list of every server (and there were 100s) in the Managed Service Providers’ data centre and then set strategies, in place to migrate groups of servers.

We also set up our own version of an AWS Landing Zone as a foundational building block so best practices were set up for account management and security in AWS.

It’s important to lay this foundation and do some good analysis up front. Things will change along the way but a good period of discovery at the start of a project is essential.

But, don’t over analyse!

Do you need a plan? Absolutely. There is a number of good reasons why you need a plan. It sets a guideline and communication within the team and outside it. But I think you can spend too much time planning and not enough time doing.

We started with a high-level plan with groups of servers supporting different services for our client. We estimated some rough timelines and then got into it. And we learnt a lot along the way and then adapted our plan to show value to our client.

Pivot

Mike Tyson once said “Everybody has a plan until they get punched in the mouth”

When things go wrong you need to adapt and change. When we started migrating one particular set of servers out from the incumbent data centre we discovered their network was slow and things ground to a hold during the migration. So, like being punched in mouth, you take the hit and focus on a different approach. We did get back to those servers and got them into the cloud but we didn’t let them derail our plans.

Challenge the status quo

When I started working with a client migration project recently, the team had just finished migrating one of the key databases up into AWS, but the backup process was failing, as the backup window was no longer large enough.

After digging a little deeper, it was found that the backup process itself was very slow and cumbersome, but it had been working (mostly) for years, so ‘why change, right?’! The solution we put in place was to switch to a more lightweight process, which completed in a fraction of the time.

What’s my role, what’s your role?

It’s a really good idea to get an understanding of what everyone’s role is when working with multiple partners. We found taking a useful idea from ITIL and creating a RACI matrix (https://en.it-processmaps.com/products/itil-raci-matrix.html ) a really good way to communicate who was responsible for what during the migration and also with the support of services after the migration.

Not one size fits all.

There are a number of different ways to migrate applications out of data centres and into the cloud. We follow the “6 Rs” for different strategies for moving servers in AWS (https://aws.amazon.com/blogs/enterprise-strategy/6-strategies-for-migrating-applications-to-the-cloud/).

Although with most servers we used a “Lift and Shift” or “Rehosting” approach in a number of cases we were also “Refactoring”, “Re-architecting” and “Retiring” where this made sense.

Go “Agile”.

In short, “Agile” can mean a lot of different things to different people. It can also depend on the maturity of your client and their previous experiences.

We borrowed ideas from the Kanban methodology such as using sticky notes and tools like Trello to visualise the servers being migrated and to help us limit tasks in progress to make the team more productive.

We found we could take a lot of helpful parts of Agile methodologies like Scrum including stand-ups which allowed daily communication within the team.

And finally but probably most important – Manage Up and Around!

My old boss once told me “perception is reality” and it has always stuck.

It’s critical senior stakeholders are kept well informed of progress in a concise manner and project governance is put in place. This way key stakeholders from around the business can assist when help is needed and are involved in the process.

So, how does this work in an Agile world? Communications are key. You can still run your project using an Agile methodology but it’s still important to provide reporting on risks, timelines and financials to senior stakeholders. This reporting, along with regular updates with governance meetings reinforcing these written reports, will mean your client will be kept in the loop and the project on track.

NZTA All-Access Hackathon

Consegna and AWS are proud to be sponsoring the NZTA Hackathon again this year. The event will be held the weekend of 21 – 23 September.

Last year’s event, Save One More Life, was a huge success and the winner’s concept has been used to help shape legislation in order to support its adoption nationally.
The information session held in the Auckland NZTA Innovation space last night, provided great insight into this years event which focuses on accessible transport options, in other words making transport more accessible to everyone, especially those without access to their own car, the disabled, and others in the community who are isolated due to limited transportation options.

The importance of diversity among the teams was a strong theme during the evening. For this event in particular, diverse teams are going to have an edge, as Luke Krieg, the Senior Manager of Innovation NZTA, pointed out, “Data can only tell you so much about a situation. It’s not until you talk to people, that the real insights appear – and what the data alone doesn’t reveal also becomes evident.”

Jane Strange, CX Improvement Lead NZTA, illustrated this point nicely with a bell curve that shows the relationship between users at each extreme of the transport accessibility graph.

Those on the right with high income, urban location, proximity to and choice of transport options invariably define transport policy for those to the left of the curve who are those with low income, located in suburban or rural areas, who are typically more isolated and have fewer transport options.

Luke also stressed how much more successful diverse teams participating in Hackathons usually are. As these are time-boxed events that require a broad spectrum of skills, technology in and of itself often doesn’t win out. Diverse skills are essential to a winning team.

For more information and registration to the event, please visit https://nzta.govt.nz/all-access

 

Cognito User Pool Migration

At Consegna, we like AWS and their services which are covered by a solid bench of documentation, blog posts and best practices. Because it is easy to find open source production ready code on GitHub, it is straightforward to deploy new applications quickly and at scale. However, sometimes, moving too fast may lead to some painful problems over time!

Deploying the AWS Serverless Developer Portal from Github straight to production works perfectly fine. Nevertheless, hardcoded values within the templates make complicated to deploy multiple similar environments within the same AWS account. Introducing some parameterization is usually the way to go to solve that problem. But that leads to deal with a production stack to not be aligned with the staging environments which is, of course, not a best practice…

This blog post describes the solution we have implemented to solve the challenge of migrating Cognito users from one pool to another at scale. The extra step of migrating API keys associated to those users is covered in this blog.

The Technology Stack

The deployed stack involves AWS serverless technologies such as Amazon API Gateway, AWS Lambda, and Amazon Cognito. It is assumed in this blog post that you are familiar with those AWS services but we encourage you to check out the AWS documentation or to contact Consegna for more details.

The Challenge

The main challenge is to migrate Cognito users and their API keys at scale without any downtime or requiring any password resets from the end users.

The official AWS documentation describes two ways of migrating users from one user pool to another:

1. Migrate users when they sign-in using Amazon Cognito for the first time with a user migration Lambda trigger. With this approach, users can continue using their existing passwords and will not have to reset them after the migration to your user pool.
 2. Migrate users in bulk by uploading a CSV file containing the user profile attributes for all users. With this approach, users will require to reset their passwords.

We discarded the second option as we did not want our users to “pay” for this backend migration. So we used the following AWS blog article as a starting point while keeping in mind that it does the cover the entire migration we need to implement. Indeed, by default, an API key is created for every user registering on the portal. The key is stored in API Gateway and is named based on the user’s CognitoIdenityId attribute which is specific to each user within a particular Cognito user pool.

The Solution

The Migration Flow

The following picture represents our migration flow with the extra API key migration step.

Screen Shot 2018-08-13 at 4.38.00 PM

Migration Flow

Notes

  1. The version of our application currently deployed in production does not support the Forget my password flow so we did not implement it in our migration flow (but we should and will).
  2. When a user registers, they must submit a verification code to have access to his API key. In the very unlikely situation where a user has registered against the current production environment without confirming their email address, the user will be migrated automatically with automatic confirmation of their email address by the migration microservice. Based on the number of users and the low probability of this particular scenario, we considered it as an acceptable risk. However it might be different for your application.

The Prerequisites

In order to successfully implement the migration microservice, you first need to grant some IAM permissions and to modify the Cognito user pool configuration.

  1. You must grant your migration Lambda function the following permissions (feel free to restrict those permissions to specific Cognito pools using
    arn:${Partition}:cognito-idp:${Region}:${Account}:userpool/${UserPoolId}):
- Action:
  - apigateway:GetApiKeys
  - apigateway:UpdateApiKey
  - cognito-identity:GetId
  - cognito-idp:AdminInitiateAuth
  - cognito-idp:AdminCreateUser
  - cognito-idp:AdminGetUser
  - cognito-idp:AdminRespondToAuthChallenge
  - cognito-idp:ListUsers
Effect: Allow
Resource: "*"
  1. On both Cognito pools (the one you are migrating from and the one you are migrating to), enable Admin Authentication Flow (ADMIN_NO_SRP_AUTH) for allowing server-based authentication by the Lambda function executing the migration. You can do it via the Management Console or the AWS CLI with the following command:
aws cognito-idp update-user-pool-client \
    --user-pool-id <value> \
    --client-id <value> \
    --explicit-auth-flows ADMIN_NO_SRP_AUTH

More details about the Admin Authentication Flow is available here.

You are all set. Let’s get our hands dirty!

The Implementation (in JS)

At the Application Layer

To allow a smooth migration for our users, the OnFailure of the login method should call our migration microservice instead of returning the original error back to the user. An unauthenticated API Gateway client is initialized to call the migrate_user method on our API Gateway. The result returned by the backend is straightforward: RETRY indicates a successful migration so the application must re login the user automatically else it must handle the authentication error (user does not exist, username or password incorrect and so on).

onFailure: (err) => {
  // Save the original error to make sure to return appropriate error if required...
  var original_err = err;

  // Attempt migration only if old Cognito pool exists and if the original error is 'User does not exist.'
  if (err.message === 'User does not exist.' && oldCognitoUserPoolId !== '') {
    initApiGatewayClient()  // Initialize an unauthenticated API Gateway client
    
    var body = {
      // Prepare the body for the request for all required information such as
      // username, password, old and new Cognito pool information
    }
    
    // Let's migrate your user!
    apiGatewayClient.post("/migrate_user", {}, body, {}).then((result) => {
      resolve(result);
      if (result.data.status === "RETRY") {  // Successful migration!
        // user can now login!
      } else {
          // Oh no, status is not RETRY...
          // Check the error code and display appropriate error message to the user
      } 
    }).catch((err) => {
      // Handle err returned by migrate_user or return original error
    });
  } else {
    // Reject original error
  }
}

The Migration microservice

API Gateway is used in conjunction with Cognito to authenticate the caller but few methods such as our migrate_user must remain unauthenticated. So here the configuration of migrate_user POST method on our API Gateway:

/migrate_user:
    post:
      produces:
      - application/json
      responses: {}
      x-amazon-apigateway-integration:
        uri: arn:aws:apigateway:<AWS_REGION>:lambda:path/2015-03-31/functions/arn:aws:lambda:<AWS_REGION>:<ACCOUNT_ID>:function:${stageVariables.FunctionName}/invocations
        httpMethod: POST
        type: aws_proxy
    options:
      consumes:
      - application/json
      produces:
      - application/json
      responses:
        200:
          description: 200 response
          schema:
            $ref: "#/definitions/Empty"
          headers:
            Access-Control-Allow-Origin:
              type: string
            Access-Control-Allow-Methods:
              type: string
            Access-Control-Allow-Headers:
              type: string
      x-amazon-apigateway-integration:
        responses:
          default:
            statusCode: 200
            responseParameters:
              method.response.header.Access-Control-Allow-Methods: "'DELETE,GET,HEAD,OPTIONS,PATCH,POST,PUT'"
              method.response.header.Access-Control-Allow-Headers: "'Content-Type,Authorization,X-Amz-Date,X-Api-Key,X-Amz-Security-Token'"
              method.response.header.Access-Control-Allow-Origin: "'*'"
        passthroughBehavior: when_no_match
        requestTemplates:
          application/json: "{\"statusCode\": 200}"
        type: mock

The implementation of migrate_user is simply added to our express-server.js so no Lambda to manage so to speak. The function is available below and we are going to deep dive in details into each step:

app.post('/migrate_user', (req, res) => {
    // 1 -- Extract paramters from the body
    var username = req.body.username;
    var password = req.body.password;
    // etc ...

    var oldCognitoIdentityId = null;
    var cognitoIdentityId = null;
    var answer = { "status": "NO_RETRY" };

    const migrate_task = async () => {

        // 2 -- Check if migration is required
        let result = await isMigrationRequired(username, cognitoUserPoolId);
        if (result === false) return "NO_RETRY";

        // 3 -- Resolve the CognitoIdentityId of the user within the old pool
        result = await getCognitoIdentityId(username, password, oldCognitoUserPoolId, oldCognitoIdentityPoolId, oldCognitoClientId, oldCognitoRegion);
        if (result.error != null) {
            // Analyse error and return appropriate error code
            if (result.error.code === "PasswordResetRequiredException") return "NO_RETRY_PASSWORD_RESET_REQUIRED";
            else return "NO_RETRY";
        } else oldCognitoIdentityId = result.cognitoIdentityId;

        // 4 -- Extract the user's attributes to migrate from the old to the new pool
        var attributesToMigrate = await getUserAttributes(username, oldCognitoUserPoolId);

        // 5 -- Migrate user from old to new pool
        result = await migrateUser(username, password, cognitoUserPoolId, cognitoClientId, attributesToMigrate);
        if (result.error !== null) {
            // Something went wrong during the migration!
            return "NO_RETRY";
        }

        // 6 -- Resolve the CognitoIdentityId of the user within the new pool
        result = await getCognitoIdentityId(username, password, cognitoUserPoolId, cognitoIdentityPoolId, cognitoClientId, cognitoRegion);
        if (result.error !== null) {
            // Analyse error and return appropriate error code
            if (result.error.code === "PasswordResetRequiredException") return "NO_RETRY_PASSWORD_RESET_REQUIRED";
            else return "NO_RETRY";
        } else cognitoIdentityId = result.cognitoIdentityId;

        // 7 -- Migrate the user's API key
        result = await migrateApiKey(username, cognitoIdentityId, oldCognitoIdentityId);

        // 8 -- Migration complete!
        return "RETRY";
    }

    migrate_task()
        .then((value) => {
            answer.status = value;
            if (value === "RETRY") {
                res.status(200).json(answer);
            } else res.status(500).json(answer);
        })
        .catch((error) => {
            answer.status = value;
            res.status(500).json(answer);
        })
});
1 – Extract parameters from the body

All the data required for the migration has been passed by the application to our function via req so we just extract it. Of course do not log the password else it will appear in clear in the execution logs of your Lambda.

Note: you might wish to inject the Cognito pool information directly to the Lambda via environment variables instead of passing via the body of the request.

2 – Check if migration is required

A migration is indicated as required only if the user does not already exist in the new pool. However be aware that this function does not verify the existence of the user in the old pool (the check is made during step 3.):

function isMigrationRequired(username, cognitoUserPoolId) {
  return new Promise((resolve, reject) => {
    var params = {
      Username: username,
      UserPoolId: cognitoUserPoolId
    };
    
    cognitoidentityserviceprovider.adminGetUser(params, function(lookup_err, data) {
      if (lookup_err) {
        if (lookup_err.code === "UserNotFoundException") {
          // User not found so migration should be attempted!
          resolve(true);
        } else {
          reject(lookup_err)  // reject any other error
        }
      } else {
        resolve(false);  // User does exist in the pool so no migration required
      }
    });
  })
};
3 – Resolve the CognitoIdentityId of the user within the old pool

Authenticate the user against the old pool using adminInitiateAuth and get his CognitoIdentityId via the getId method. This is required for the migration of the user’s API key. Of course, if the user cannot be authenticated against the old pool, they cannot be migrated so the function returns the error straight away.

function getCognitoIdentityId(username, password, cognitoUserPoolId, cognitoIdentityPoolId, cognitoClientId, cognitoRegion) {

  var params = {
    AuthFlow: 'ADMIN_NO_SRP_AUTH',
    ClientId: cognitoClientId,
    UserPoolId: cognitoUserPoolId,
    AuthParameters: {
      USERNAME: username,
      PASSWORD: password
    }
  };

  var result = {
    "cognitoIdentityId": null,
    "error": null
  }

  return new Promise((resolve, reject) => {
    cognitoidentityserviceprovider.adminInitiateAuth(params, function(initiate_auth_err, data) {
      if (initiate_auth_err) {
        // Error during authentication of the user against the old pool so this user cannot be migrated!
        result.error = initiate_auth_err;
        resolve(result);
      } else {
        // User exists in the old pool so let's get his CognitoIdentityId
        var Logins = {};
        Logins["cognito-idp." + cognitoRegion + ".amazonaws.com/" + cognitoUserPoolId] = data.AuthenticationResult.IdToken;
        params = {
          IdentityPoolId: cognitoIdentityPoolId,
          Logins: Logins
        };
        cognitoidentity.getId(params, function(get_id_err, data) {
          result.cognitoIdentityId = data.IdentityId;
          resolve(result);
        });
      }
    });
  });
}
4 – Extract the attribute of user to migrate from the old to the new pool

Resolve the user’s attributes to migrate and force email_verified to true to avoid post-migration issues.

Note: all the attributes must be migrated except sub because this attribute is Cognito pool specific and will be created by the new pool.

function getUserAttributes(username, cognitoUserPoolId) {
  var user = null;
  var params = {
    UserPoolId: cognitoUserPoolId,
    Filter: "username = \"" + username + "\""
  };
  
  var result = [];

  return new Promise((resolve, reject) => {
    cognitoidentityserviceprovider.listUsers(params, function(list_err, data) {
      if (list_err) console.log("Error while listing users using " + params + ": " + list_err.stack);
      else {
        data.Users[0].Attributes.map(function(attribute) {
          if (attribute.Name === 'email_verified') {
            attribute.Value = 'true';
          }
          if (attribute.Name !== 'sub') result.push(attribute);
        });
      }

      resolve(result);
    });
  });
}
5 – Migrate user from old to new pool

Our user is now ready to be migrated! So let’s use the admin features of Cognito(adminCreateUser, adminInitiateAuth, and adminRespondToAuthChallenge) to create the user, authenticate the user, and set their password.

function migrateUser(username, password, cognitoUserPoolId, cognitoClientId, attributesToMigrate) {
  var params = {
    UserPoolId: cognitoUserPoolId,
    Username: username,
    MessageAction: 'SUPPRESS', //suppress the sending of an invitation to the user
    TemporaryPassword: password,
    UserAttributes: attributesToMigrate
  };
  
  var result = {
    "error": null
  }
  
  return new Promise((resolve, reject) => {
    cognitoidentityserviceprovider.adminCreateUser(params, function(create_err, data) {
      if (create_err) {
        result.error = create_err;
        resolve(result);
      } else {
        // Now sign in the migrated user to set the permanent password and confirm the user
        params = {
          AuthFlow: 'ADMIN_NO_SRP_AUTH',
          ClientId: cognitoClientId,
          UserPoolId: cognitoUserPoolId,
          AuthParameters: {
            USERNAME: username,
            PASSWORD: password
          }
        };
        cognitoidentityserviceprovider.adminInitiateAuth(params, function(initiate_auth_err, data) {
          if (initiate_auth_err) {
            result.error = initiate_auth_err;
            resolve(result);
          } else {
            // Handle the response to set the password (confirm the challenge name is NEW_PASSWORD_REQUIRED)
            if (data.ChallengeName !== "NEW_PASSWORD_REQUIRED") {
              result.error = new Error("Unexpected challenge name after adminInitiateAuth [" + data.ChallengeName + "], migrating user created, but password not set")
              resolve(result)
            }

            params = {
              ChallengeName: "NEW_PASSWORD_REQUIRED",
              ClientId: cognitoClientId,
              UserPoolId: cognitoUserPoolId,
              ChallengeResponses: {
                "NEW_PASSWORD": password,
                "USERNAME": data.ChallengeParameters.USER_ID_FOR_SRP
              },
              Session: data.Session
            };
            cognitoidentityserviceprovider.adminRespondToAuthChallenge(params, function(respond_err, data) {
              if (respond_err) {
                result.error = respond_err;
              }

              resolve(result)
            });
          }
        });
      }
    });
  });
}
6 – Resolve the CognitoIdentityId of the user within the new pool

Our user is now created within the new pool so let’s resolve his CognitoIdentityId required for migrating his API key.

7 – Migrate user’s API key

Migrate the user’s API key by renaming it to point to the user’s CognitoIdentityId resolved during step 6.

function migrateApiKey(username, cognitoIdentityId, oldCognitoIdentityId) {
  var params = {
    nameQuery: oldCognitoIdentityId
  };

  return new Promise((resolve, reject) => {
    apigateway.getApiKeys(params, function(get_key_err, data) {
      params = {
        apiKey: apiKeyId,
        patchOperations: [{
          op: "replace",
          path: "/name",
          value: cognitoIdentityId
        }, {
          op: "replace",
          path: "/description",
          value: "Dev Portal API Key for " + cognitoIdentityId
        }]
      };
      // Update API key name and description to reflect the new CognitoIdentityId
      apigateway.updateApiKey(params, function(update_err, data) {
        console.log("API key (id: [" + apiKeyId + "]) updated successfully");
        resolve(true)
      });
    });
  })
}
8 – Migration complete, so return RETRY to indicate success

The migration is now complete so return RETRY status indicating to the application that the user must be re logged in automatically.

Conclusion

By leveraging AWS serverless technologies we have been able to fully handle the migration of our client’s application users at the backend level. The customer was happy with this solution as it avoided sending requests to the users to reset their password and it realigned the production with staging.

It’s implementing solutions like this that helps set Consegna apart from other cloud consultancies — we are a true technology partner and care deeply about getting outcomes for customers that align with their business goals, not just looking after our bottom line.

What is your digital waste footprint?

How many times have you walked into your garage and took stock of all the things you haven’t used in years? Those bikes that you bought for you and your partner that you haven’t used since the summer of ‘09, the fishing rods, the mitre saw, the boat (if you’re lucky) and the list goes on and on. Imagine if you didn’t have to pay for them all up front – and better yet, imagine if you could stop paying for them the moment you stopped using them!

Amazingly, that is the world we live in with the public cloud. If you’re not using something, then you shouldn’t be paying for it – and if you are, then you need to ask yourself some hard questions. The problem we’re seeing in customer-land is twofold:

  1. Technical staff are too far removed from whoever pays the bills, and
  2. It’s easier than ever to start new resources that cost money

Technical staff don’t care about the bill

Many technical staff that provision resources and use services on AWS have no idea what they cost and have never seen an invoice or the billing dashboard. They don’t pay the bills, so why would they worry about what it costs?

Working with technical staff and raising awareness around the consequences of their choices in the public cloud goes a long way to arresting the free-fall into an unmanageable hosting bill. By bringing the technical staff along on the optimisation journey, you’re enabling them to align themselves with business goals and feel the choices they make are contributing in a positive way.

It’s so easy to create new resources

One of the biggest strengths of the public cloud is how easy it is to provision resources or enable services, however this appears to be one of its weaknesses as well. It’s because of this ease of use that time and time again we see serious account sprawl: unused, underutilised and over-sized resources litter the landscape, nobody knows how much Project A costs compared to Project B and there isn’t a clear plan to remediate the wastage and disarray.

Getting a handle on your hosting costs is an important step to take early on and implementing a solid strategy to a) avoid common cost related mistakes and b) be able to identify and report on project costs is crucial to being successful in your cloud journey.

Success stories

Consegna has recently engaged two medium-to-large sized customers and challenged them to review the usage of their existing AWS services and resources with a view to decreasing their monthly cloud hosting fees. By working with Consegna as an AWS partner and focusing on the following areas, one customer decreased their annual bill by NZD$500,000 and the other by NZD$100,000. By carefully analysing the following areas of your cloud footprint, you should also be able to significantly reduce your digital waste footprint.

Right-sizing and right-typing

Right-sizing your resources is generally the first step you’ll take in your optimisation strategy. This is because you can make other optimisation decisions that are directly related to the size of your existing resources, and if they aren’t the right size to begin with then those decisions will be made in error.

Right-typing can also help reduce costs if you’re relying on capacity in one area of your existing resource type that can be found in a more suitable resource type. It’s important to have a good idea of what each workload does in the cloud, and to make your decisions based on this instead of having a one-size-fits all approach.

Compute

Right-sizing compute can be challenging if you don’t have appropriate monitoring in place. When making right-sizing decisions there are a few key metrics to consider, but the main two are CPU and RAM. Because of the shared responsibility model that AWS adheres to, it doesn’t have access to RAM metrics on your instances out-of-the-box so to get a view on this you need to use third party software.

Consegna has developed a cross-platform custom RAM metric collector that ships to CloudWatch and has configured a third-party integration to allow CloudCheckr to consume the metrics to provide utilisation recommendations. Leveraging the two key metrics, CPU and RAM, allows for very accurate recommendations and deep savings.

Storage

Storage is an area that gets overlooked regularly which can be a costly mistake. It’s important to analyse the type of data you’re storing, how and how often you’re accessing it, where it’s being stored and how important it is to you. AWS provides a myriad of storage options and without careful consideration of each, you can miss out on substantial decreases of your bill.

Database

Right-sizing your database is just as important as right-sizing your compute – for the same reasons there are plenty of savings to be had here as well.

Right-typing your database can also be an interesting option to look at as well. Traditional relational databases appear to be becoming less and less popular with new serverless technologies like DynamoDB – but it’s important to define your use case and provision resources appropriately.

It’s also worth noting that AWS have recently introduced serverless technologies to their RDS offering which is an exciting new prospect for optimisation aficionados.

Instance run schedules

Taking advantage of not paying for resources when they’re not running can make a huge difference to how much your bill is, especially if you have production workloads that don’t need to be running 24/7. Implementing a day / night schedule can reduce your bill by 50% for your dev / test workloads.

Consenga takes this concept to the next level by deploying a portal for non-technical users to control when the instances they deal with day-to-day are running or stopped. By pushing this responsibility out to the end users, instances that would have been running 12 hours a day based on a rigid schedule now only run for as long as they’re needed – an hour, or two usually – supercharging the savings.

Identify and terminate unused and idle resources

If you’re not using something then you should ask yourself if you really need it running, or whether or not you could convert it to an on-demand type model.

This seems like an obvious one, but the challenge can actually be around identification – there are plenty of places resources can hide in AWS so being vigilant and using the help of third party software can be key to aid you in this process.

Review object storage policies

Because object storage in AWS (S3) is so affordable, it’s easy to just ignore it and assume there aren’t many optimisations to be made in this area. This can be a costly oversight as not only the type of storage you’re using is important, but how frequently you need to access the data as well.

Lifecycle policies on your object storage is a great way to automate rolling infrequently used data into cold storage and can be a key low-hanging fruit that you can nab early on in your optimisation journey.

Right-type pricing tiers

AWS offers a robust range of pricing tiers for a number of their services and by identifying and leveraging the correct tiers for your usage patterns, you can make some substantial savings. In particular you should be considering Reserved Instances for your production resources that you know are going to be around forever, and potentially Spot Instances for your dev / test workloads that you don’t care so much about.

Of course, there are other pricing tiers in other services that are worth considering.

Going Cloud Native

AWS offers many platform-as-a-service offerings which take care of a lot of the day to day operational management that is so time consuming. Using these offerings as a default instead of managing your own infrastructure can provide some not so tangible optimisation benefits.

Your operations staff won’t be bogged down with patching and keeping the lights on – they’ll be freed up to innovate and explore the new and exciting technologies that AWS are constantly developing and releasing to the public for consumption.

Consegna consistently works with its technology and business partners to bake this optimisation process into all cloud activities. By thinking of ways to optimise and be efficient first, both hosting related savings and operational savings are achieved proactively as opposed to reactively.

Slow DoS attack mitigation — a Consegna approach.

Recently we discovered that a customer’s website was being attacked in what is best described as a “slow DoS”. The attacker was running a script that scraped each page of the site to find possible PDF files to download, then was initiating many downloads of each file.

Because the site was fronted by a Content Delivery Network (CDN), the site itself was fine and experienced no increase in load or service disruption, but it did cause a large spike in bandwidth usage between the CDN and the clients. The increase in bandwidth was significant enough to increase the monthly charge from around NZ$1,500 to over NZ$5,000. Every time the customer banned the IP address that was sending all the requests, a new IP would appear to replace it. It seems the point of the attack was to waste bandwidth and cost our customer money — and it was succeeding.

The site itself was hosted in AWS on an EC2 instance, however the CDN service the site was using was a third party — Fastly. After some investigation, it seemed that Fastly didn’t have any automated mitigation features that would stop this attack. Knowing that AWS Web Application Firewall (WAF) has built in rate-based rules we decided to investigate whether we could migrate the CDN to CloudFront and make use of these rules.

All we needed to do was create a CloudFront distribution with the same behaviour as the Fastly one, then point the DNS records to CloudFront — easy right? Fastly has a neat feature that allows you to redirect at the edge which was being used to redirect the apex domain to the www subdomain — if we were to replicate this behaviour in CloudFront we would need some extra help, but first we needed to make sure we could make the required DNS changes.

To point a domain at CloudFront that is managed by Route 53 is easy, you can just set an ALIAS record on the apex domain and a CNAME on the www subdomain. However, this customers DNS was managed by a third-party provider who they were committed to sticking with (this is a blog post for another day). The third-party provider did not support ALIAS or ANAME records and insisted that apex domains could only have A records — that meant we could only use IP addresses!

Because CloudFront has so many edge locations (108 at the time of writing), it wasn’t practical to get a list of all of them and set 108 A records — plus this would require activating the “static IP” feature of CloudFront which gives you a dedicated IP for each edge location, which costs around NZ$1,000 a month.

And to top all that off, whatever solution we decided to use would only be in place for 2 months as the site was being migrated to a fully managed service. We needed a solution that would be quick and easy to implement — AWS to the rescue!

So, we had three choices:

  1. Stay with Fastly and figure out how to ban the bad actors
  2. Move to CloudFront and figure out the redirect (bearing in mind we only had A records to work with)
  3. Do nothing and incur the NZ$5,000 cost each month — high risk if the move to a managed service ended up being delayed. We decided this wasn’t really an option.

We considered spinning up a reverse proxy and pointing the apex domain at it to redirect to the www subdomain (remember, we couldn’t use an S3 bucket because we could only set A records) but decided against this approach because we’d need to make the reverse proxy scalable given we’d be introducing it in front of the CDN during an ongoing DoS attack. Even though the current attack was slow, it could have easily been changed into something more serious.

We decided to stay with Fastly and figure out how to automatically ban IP addresses that were doing too many requests. Aside from the DNS limitation, one of the main drivers for this decision was inspecting the current rate of the DoS — it was so slow that it was below the minimum rate-based rule configuration that the AWS WAF allows (2,000 requests in 5 minutes). We needed to write our own rate-based rules anyway, so using CloudFront and WAF didn’t solve our problems straight away.

Thankfully, Fastly had an API that we could hit with a list of bad IPs — so all we needed to figure out was:

  1. Get access to the Fastly logs,
  2. Parse the logs and count the number of requests,
  3. Auto-ban the bad IPs.

Because Fastly allows log shipping to S3 buckets, we configured it to ship to our AWS account in a log format that could be easily consumed by Athena, and wrote a couple of AWS Lambda functions that:

  1. Queried the Fastly logs S3 bucket using Athena,
  2. Inspected the logs and banned bad actors by hitting the Fastly API, maintaining state in DynamoDB,
  3. Built a report of bad IPs and ISPs and generated a complaint email.

The deployed solution looked something like this:

By leveraging S3, Athena, Lambda and DynamoDB we were able to deploy a fully serverless rate-based auto-banner for bad actors with a very short turnaround. The customer was happy with this solution as it avoided having to incur the $5000 NZD / month cost, avoided needing to change the existing brittle DNS setup and also provided some valuable exposure into how powerful serverless technology on AWS is.

It’s implementing solutions like this that helps set Consegna apart from other cloud consultancies — we are a true technology partner and care deeply about getting outcomes for customers that align with their business goals, not just looking after our bottom line.