AWS Summit ANZ 2022 - Migrating to a well-architected landing zone (SYS2)
Aug 16, 2023
AWS Summit ANZ 2022 - Migrating to a well-architected landing zone (SYS2)
Cloud environments can become quite large through organic growth or mergers and acquisitions, which can lead to management complexity. While this complexity can be simplified through multi account best practices, a single AWS organisational hierarchy and AWS Control Tower, it can be daunting to migrate a large set of existing accounts to this model. This session explores learnings, essential decision points, and the tools available to help you prepare and execute your own smooth transition. Learn more about AWS webinar series in Australia and New Zealand at https://go.aws/3ChL0Y6 . Subscribe: More AWS videos http://bit.ly/2O3zS75 More AWS events videos http://bit.ly/316g9t4 ABOUT AWS Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster. #AWSSummit #AWS #AmazonWebServices #CloudComputing
Content
15.12 -> My name is Chris Dorrington
16.88 -> and I’m a Principal Cloud Architect
at AWS Professional Services.
19.56 -> At Professional Services,
20.72 -> we are lucky enough to work
very closely with customers,
23.16 -> helping them along their cloud journey
on a variety of interesting topics.
26.92 -> Today, we are going to be talking about
migrating existing accounts
29.96 -> to a well-architected Landing Zone.
31.96 -> In particular, we will be talking about
the Control Tower Landing Zone
34.96 -> as a destination for your accounts.
36.92 -> But this is not a Control Tower 101
session though.
39.44 -> Instead, we are going to go
a lot deeper.
41.48 -> And I'll take you through a recent
engagement where we helped a customer
44.4 -> perform such a migration.
46.88 -> So what are you going to learn?
48.72 -> Through my narrative about engaging
with this customer,
51.24 -> who I will now refer to as ACME,
53.12 -> I’ll detail what made them
want to make a change,
55.52 -> and which areas you should assess
on your own platform.
58.32 -> As the target design is a Control
Tower Landing Zone
61.16 -> I’ll give a recap on what
that actually is,
63.36 -> and the topics you should
consider at design time.
66.16 -> Ultimately, I want you to walk away from
this session with the tools and tips
69.84 -> that we used at ACME to give you
confidence that you can plan
73.04 -> and execute a migration of your own.
75.56 -> So, why consider a change?
77.44 -> Why go to the effort of
restructuring all your accounts?
80 -> Because after all, it does take
some effort, as we shall see.
83.8 -> So, let’s look at what triggered
the project at ACME.
86.6 -> This was their situation.
88 -> They had three payer accounts
and three separate invoices.
91.36 -> I’ll show AWS Organizations
in the diagrams here
94.12 -> but they were actually only using
the consolidated billing feature,
96.84 -> and not for any of the other
goodness it can provide.
100.28 -> The first set of accounts
they called classic accounts,
102.6 -> as these accounts had been around
for ten years or more,
105.2 -> and had organically grown in numbers
along with the business.
109.64 -> However, the other two payers and
associated accounts, they had inherited
113.6 -> as ACME had also grown
through mergers and acquisitions.
116.76 -> As part of this process, they had also
inherited tooling.
119.56 -> So overall, they ended up with many
accounts, a few hundred in total,
122.88 -> being managed in slightly different ways.
125.68 -> They did have a competent
cloud platform team.
127.8 -> So, on the face of it,
everything was OK,
129.84 -> even if there was a lot of DevOps
scripting happening
131.92 -> to cater for the differences
amongst the accounts.
135.2 -> So, their reason for change:
136.8 -> ACME was about to venture
into a new product domain,
139.4 -> and as this would mean more accounts
being added to their footprint,
142.28 -> they asked AWS Professional Services
to perform what we call
145.56 -> an Executive Cloud Security Assessment,
or ECSA for short.
150.28 -> This is an engagement where we take a
deep look at the platform and how it is
153.44 -> run, asking the questions in six areas
that you can see on the screen here.
157.76 -> Based on the conversations that we had,
we were able to identify a range of risks,
161.68 -> ranging from low to
critical in severity.
165.12 -> I'll list here the selection of the
findings that we felt could be fixed
168 -> by having a better
Landing Zone in place.
169.96 -> A lot of the risks were quite common in nature,
in the fact that there was
172.88 -> no uniform application of rules, or
a single place where they could view
176.32 -> conformance against compliance,
monitor threat detection, or other
179.72 -> things that needed to be mandated
across groups of accounts and workloads.
184.44 -> All up, there were 15 findings
that we found could be remediated
187.4 -> by a Control Tower Landing Zone.
189.76 -> So, let’s take a deeper look
into the Control Tower Landing Zone.
193 -> But in general, what is a Landing Zone?
195.16 -> Let’s have a recap on that.
197.52 -> The Landing Zone is named as such
199.56 -> because it is a zone to land
your AWS workloads.
202.6 -> And if you use a Landing Zone,
you can be sure that the accounts and
205.4 -> their workloads will meet your company’s
security and governance requirements.
210.32 -> This can be done by applying controls
and guardrails that can be configured
213.64 -> and managed centrally.
215.32 -> As I inferred earlier, this was
what was lacking at ACME,
218.16 -> and was picked up in the
security assessment,
220.2 -> that lack of central management
and a centralised 'at a glance' view
223.68 -> into their AWS footprint.
226.8 -> According to multi-account best
practice, accounts should be used to
229.8 -> separate workloads, and a Landing Zone
should be designed with this in mind.
233.56 -> And grouping by organisational
units can help with this.
237.28 -> We decided at ACME that a new
Landing Zone was desirable.
240.28 -> And one that had been designed,
as opposed to their existing setup,
243.32 -> which could be best described as
having evolved over time,
246.4 -> and now not meeting the requirements
of the expanding company.
250.2 -> So what to do next?
251.72 -> Landing Zones can be built by hand
using white papers
254.16 -> and a well-architected framework as
a guide but it can take effort and time.
258.24 -> And this is where the Control Tower
service comes in.
260.76 -> The service was launched in 2019 with
the aim to automate nearly all of the
264.36 -> heavy lifting required to create
a best practice Landing Zone.
267.96 -> This base architecture you see here
is what AWS Control Tower
271 -> provides to customers within
a few clicks in the console.
274.64 -> Once enabled, AWS Control Tower provides
a framework to set up a well-architected,
279.04 -> multi-account AWS environment based on
security and compliance best practices.
283.8 -> It is launched in the top level account,
or the management account as it is known,
286.96 -> and tightly integrates
with AWS Organizations.
289.96 -> By using organisational units,
accounts can be separated and governed
293.44 -> by different sets of security policies,
as those groupings dictate.
297.8 -> This was perfect for ACME,
as it would quickly get them
300.48 -> up and running with a base platform.
302.92 -> For instance, out of the box,
they got SSO integration
305.64 -> and the ability to provision
a new account on demand
308.16 -> that would automatically have
the baseline controls applied to it.
312.08 -> You can see here the Control Tower
dashboard which enabled them
314.64 -> to easily view the compliance status
of their accounts
317.32 -> within the Landing Zone, according to
the controls and guardrails
320.2 -> that Control Tower provides
out of the box.
322.96 -> And this was a good start but to remediate
more of their security findings,
326.16 -> additional AWS services were required.
328.56 -> The caveat being that we still wanted
a single pane of glass management.
333.6 -> Central visibility and management
is made easier now that more and more
336.72 -> AWS services are natively integrating
with AWS Organizations.
340.88 -> As long as all your accounts are
within a single AWS Organization,
344.36 -> you can enable services for all
the accounts within it
346.76 -> with just a few clicks or a line of code.
350.08 -> Examples being Amazon GuardDuty,
which can monitor network level activity
353.84 -> for anomalies, and AWS Firewall Manager,
so that you can be alerted
357.48 -> to things like open security groups.
360.48 -> Integrating across these security
services can be done via AWS Security Hub.
364.8 -> This service really does give that
single pane of glass approach
367.56 -> by aggregating the findings from
all the various security services,
370.88 -> prioritising them, and putting
them in one comprehensive view.
374.64 -> This was really interesting to ACME
as they currently did not have
377.32 -> that one single feed of security info.
379.8 -> Through the mergers and acquisitions,
they had inherited different tool sets,
383 -> and although there had been attempts to
standardise, there were differences
385.84 -> in the formats and the commercial
software being used.
388.6 -> Security Hub would enable them to pipe all
the alerts
391.16 -> into the cyber team’s chosen tooling.
394.72 -> But in addition to findings aggregation,
Security Hub has another nice feature,
398.04 -> which would really help ACME close a few
more of their security recommendations:
402.48 -> the ability to turn on checks for
applicable benchmarks that they really
405.72 -> should have been following for the
types of work they were undertaking.
409.48 -> Within a click or two they could
turn on the CIS benchmark
412.64 -> or PCIS DSS for their
payments-related accounts.
416.04 -> And not only would it give them a score,
but by highlighting the exact resources
419.2 -> that needed attention, it would give
them a good starting point
421.88 -> for targeted remediation and to
improve that compliance score.
426.64 -> At this point, most ACME stakeholders
were convinced that the single
429.8 -> AWS Organizations, and the Control Tower
Landing Zone was the way to go.
433.76 -> However, in the platform team
there was definitely a sense of,
437.8 -> "that sounds easy for a fresh install,
439.48 -> but we have this complex setup of three
different groups of accounts".
442.76 -> And I also heard "It’s not greenfields
accounts we are worried about here.
446.48 -> We have hundreds of accounts
that we would need to migrate."
449.92 -> And this might be exactly how you have
been thinking when you've heard about
453 -> Control Tower Service and read the
various blogs and best practices.
456.92 -> So let’s talk about how
we overcame this at ACME.
460.24 -> Planning and executing a migration.
462.12 -> As the title suggests,
463.2 -> this is definitely not about
greenfields accounts.
465.56 -> This is about reorganising
existing accounts.
468.64 -> For the rest of the presentation, I’ll
outline what we did and highlight
471.6 -> the tips and tricks that helped ACME
through a successful migration.
475.32 -> Hopefully, at the end of this, you’ll
be able to walk away with a plan
478.08 -> for how you can do this
in your own organisation.
481.28 -> So, here are the four phases
we went through at ACME,
483.68 -> and what you can use yourself as a
base for your own migration project.
488.48 -> First, we started with the Landing Zone
design and build phase,
491.44 -> the output of which meant we had
somewhere to migrate the accounts to.
495.2 -> But before we could do that, we had to
understand if moving them was going to
498.96 -> cause issues for the
workloads running inside them.
502.36 -> We wanted to minimise the
disruption as much as possible.
505.44 -> So this risk identification and migration
planning stage was critically important.
512.04 -> After this planning phase, we were
ready to move a couple accounts across
515.28 -> in what we called the test
migration phase.
517.6 -> A phase where we could create and
refine a list of migration steps.
521.48 -> Once we had a solid plan in place,
we could then progress to move
524.68 -> the remainder of the accounts.
526 -> And I'm going to dive into each of
these phases, one by one.
530.28 -> Let’s start at the beginning.
531.4 -> Let’s dive deeper into the first phase,
533.28 -> which is the Landing Zone design
and build phase.
537.12 -> This phase is very important, because
to create the ideal Landing Zone,
540.44 -> we needed to consider the various
topics that I’m showing here.
544.16 -> We did this at ACME by running
a series of workshops
546.48 -> with the right stakeholders in the room.
548.12 -> For instance, the networking
team and the cyber security team.
552.12 -> These people need to be involved from
the beginning on a journey such as this,
555.56 -> as there are many decisions
that cannot be made
557.76 -> by a cloud platform team on their own.
560.72 -> The approach was to take a decision log
and create a high-level design
564.24 -> based on the unique requirements of ACME.
566.6 -> There are many AWS services and
configuration options to choose from,
570 -> and not all were suitable for ACME,
571.64 -> just as they won’t be for
your organisation.
574.16 -> These workshops turned out very useful.
576 -> A time to outline the best practice
to the stakeholders,
578.48 -> and together, to choose the
right path going forward.
582.4 -> The workshops started with account
structure, where we designed the initial
585.6 -> set of organisational units that we were
going to use to group the workloads around.
589.88 -> We’ll dive deeper into
this topic very soon.
592.64 -> Next, we had security and governance,
594.76 -> where we detailed the available
guardrails and controls,
597.36 -> and captured which ones would
be enabled in the new Landing Zone.
601.44 -> There was a networking session,
602.88 -> as this was an area that could be
simplified for ACME.
606.12 -> Their networking architecture had
evolved through the years and this
609.2 -> would be showed by the different types
of services that were being implemented.
613.36 -> For instance, there was a lot of VPN
connections, direct connect vifs,
616.48 -> and a mesh of VPC peering.
619.08 -> Whilst their platform team was
able to manage this,
621.36 -> it was getting a bit hard to scale.
623.56 -> Migrating to a new Landing Zone
gave them the opportunity
626.28 -> to simplify their networking
by utilising Transit Gateway
629.44 -> and a hub and spoke account model.
631.8 -> ACME decided they would approach this
networking transition in a second phase
635.4 -> once they had brought accounts into
Control Tower, as there were potential
638.44 -> complexities to ensuring that
there would be no disruption
641.16 -> to their running workloads.
643.24 -> Network migration is worthy of a talk
on its own,
645.4 -> so today we won’t be covering this.
648.04 -> On to Operations.
649.4 -> This is where we talked about things
such as patching, tagging and
651.96 -> backup strategies and how they could
be enforced in the new Landing Zone.
656.4 -> Again, these were things that were
desirable at ACME but yet across
659.52 -> the three collections of accounts,
were not being uniformly implemented.
663.28 -> Finally, we covered
identity and user access.
665.84 -> ACME had a fairly permissive
set of roles given to developers
668.56 -> across all the accounts,
and this was highlighted
670.56 -> as a risk in the security assessment.
672.28 -> There was definitely no
standardisation across the board.
675.44 -> But using Control Tower and its
integration with SSO,
678.44 -> this issue could be
relatively easily fixed up.
681.68 -> In this workshop, we discussed
standardising on a set of roles,
684.52 -> and how this would work
with their identity provider.
687.08 -> So, we can’t cover all the topics today
but we will dive into a couple of them,
691.16 -> and also talk about how we actually
built out the Landing Zone
693.6 -> after the high-level design
was completed.
696 -> So, coming back to account structure.
697.84 -> This was something that ACME really didn't
have in place already.
700.72 -> Their accounts lived straight
under the root OU
702.68 -> of the three organisations.
704.48 -> They were not segregating workloads
706.04 -> based on security and
policy requirements.
708.04 -> Again, something that was highlighted
in the security assessment
711 -> and is against multi-account
best practice.
713.92 -> Actually, the multi-account best practice
white paper is a really good read and
717.24 -> I recommend it to you to understand the
reasoning behind a good OU structure.
720.76 -> I’ll share a link to this at
the end of the presentation.
723.92 -> After we completed the workshop,
we ended up with a design like this.
727.32 -> Each OU has a slightly different SCP
allocated to them via the
731.16 -> set of Control Tower guardrails enabled.
733.4 -> I won’t go through all the different
OUs that were created
735.44 -> but I will mention a couple which
turned out to be very useful.
738 -> Firstly, policy staging OU.
739.84 -> This was where ACME could try out
new policies on test accounts,
742.76 -> so they could be sure of the effect
744.08 -> before they applied it
to the destination OU.
746.6 -> This was useful because the last thing
they wanted to do was to roll out
749.24 -> a policy which affected the workloads
that were running within that OU.
752.92 -> Actually, ACME went one stage further
with their ability to test out changes,
756.36 -> and I will touch on this
a bit later on.
758.36 -> The other OU that I’d like to call
out was the migration OU.
761.76 -> This was needed because the existing
accounts had been created and used
764.84 -> with no enforcing policies via SCP,
767.52 -> nor was there any visibility
into their compliance status.
770.4 -> It was highly likely that when we moved
into their new OU,
773.24 -> lots of things could be reported
non-compliant, or worse,
776.08 -> things might stop working all together.
778.4 -> The migration OU was created
with a slightly less restrictive
781.4 -> set of guardrails enabled, and it
was a place where they could see
784.28 -> non-compliance alerts but still had
the wiggle room in the SCPs to fix them.
788.24 -> The intention of this OU was the
accounts should only reside in this
791.04 -> temporarily, whilst violations are
checked and remediated, at which point
794.76 -> the account can be then moved into
its destination OU.
798.24 -> So, moving on from the design
phase into the build phase.
801.6 -> One pressing question that we had to
tackle very early on was
804.16 -> where does this Landing Zone live?
806.36 -> There are two options available to
anyone doing this type of migration.
809.64 -> The first is to use an existing
AWS Organizations
812.28 -> and launch Control Tower within it.
814.44 -> This is actually the best option
to take if you have just
816.96 -> a single organisation, because the
accounts you would move into
820.2 -> the Landing Zone do not need to come
from an external organisation.
823.64 -> However because Control Tower is
launched from the management account,
826.32 -> which is the top level payer account,
827.96 -> and because a lot of the critical
features of your Landing Zone
830.24 -> will be administered from this
account, you want to ensure that
832.76 -> the least amount of people as possible
have access to this account.
836.92 -> As such, it is highly recommended by AWS
839.4 -> that no workloads are running
in the management account.
842.12 -> This was not the case at ACME.
843.8 -> In each of their three payer accounts,
they have had workloads running within them.
847.08 -> Therefore, none of their existing AWS
Organizations were suitable
850.24 -> for them to use for Control Tower.
852.32 -> This left us with the second option:
854.72 -> Create a brand new AWS Organizations
and launch Control Tower within that.
859.8 -> This is a slightly more complicated scenario,
as there are considerations
862.92 -> to be explored when moving accounts
from one organisation to another.
867.24 -> And it is these considerations we will
explore further in the rest of this talk.
871 -> Before I get to those, let’s talk
about how ACME went on to build
873.88 -> the Landing Zone once we had
the design locked in.
877.36 -> The Control Tower service is enabled with
a few clicks in the management account.
880.96 -> From the console, you can assign
guardrails and controls to specific OUs
884.64 -> and enrol accounts within them.
This is a relatively easy exercise.
888.24 -> However, at ACME, during the design phase,
we had decided to use some AWS services
892.08 -> within the Landing Zone that were not
controlled via the Control Tower console.
897 -> So, how did we enhance their Landing Zone
with a bespoke controls and services
900.76 -> that were required to meet the desired
security and governance requirements?
904.84 -> This is where an AWS solution called
Customisations for Control Tower comes in
908.96 -> or CfCT for short.
910.8 -> This is a solution that’s launched in the
management account that enables you to
913.76 -> launch your own CloudFormation templates
and SCPs to specific organisational units.
919.24 -> For example, you can see the
YAML configuration file here,
922 -> called the manifest file.
923.6 -> We have defined an extra SCP to block
public access to S3 buckets,
927.4 -> and this will be applied to all the accounts
in the infrastructure non-prod,
930.64 -> prod, and sandbox OUs.
932.96 -> Below this, we have a template
that will enable the VPC flow logs
935.8 -> to be sent to the central logging account.
938.24 -> The solution hooks into life cycle
events from Control Tower,
940.88 -> such as new account being created.
943 -> These events trigger the pipeline to process
the manifest file and deploy
946.28 -> the designated solutions, so you can be
sure that when an account is created,
949.84 -> the baseline controls you have
specified are almost immediately
953.08 -> applied to that account in addition
to the Control Tower guardrails
956 -> that you have enabled.
957.4 -> At ACME, this was a really good way
for the platform team
960.24 -> to centrally manage the accounts
in the company.
962.92 -> And I recommend that you take a look
at the CfCT solution yourselves
966 -> to help you build out and manage your
Control Tower Landing Zone.
970.12 -> Now, I just mentioned we used the CfCT
to turn on additional services
973.4 -> in the Landing Zone.
974.88 -> Most of these extra services
were security-related as this helped
977.96 -> to remediate the vast majority
of the security findings.
981.12 -> At ACME, we kickstarted
the implementation of these services
984.12 -> by using the examples in the AWS Security
Reference Architecture
987.64 -> or SRA as it’s known.
989.52 -> This goldmine of information describes
the best practices when it comes to
992.68 -> using AWS' security services.
994.72 -> Importantly, the SRA has a code base on
GitHub with examples specifically
998.56 -> tailored for deployment using the CfCT.
1001.08 -> This enabled us to very quickly set up
things like GuardDuty,
1004.16 -> IAM Access Analyzer, Macie, and
Security Hub, to name a few.
1007.4 -> And using the SRA, we were confident
1009.16 -> that it was set up in the correct
best practice way.
1011.68 -> So, with the Landing Zone built out
and ready to accommodate new accounts,
1014.28 -> it meant that ACME could now start
considering some migrations,
1017.4 -> which moves us into the
next phase of the project.
1020.8 -> The risk identification
and migration planning phase.
1024.16 -> Let’s recall that ACME were not able to
activate Control Tower in any of their
1027.52 -> existing AWS Organizations, so
they had to create a new one.
1030.6 -> This means that the accounts
were being migrated
1032.28 -> from one organisation to another,
1034.2 -> and this movement can cause issues.
1035.92 -> One of the most important goals
of a project like this
1038.16 -> is to not disrupt workloads at all
when moving accounts across.
1041.36 -> Hence, we have this second phase.
1043.08 -> Simply put, this is the time that you
should spend assessing existing accounts
1046.84 -> for what might break.
1048.04 -> I’ll talk about what we did at ACME,
but please note this cannot be taken as
1051.2 -> a definitive list, as everybody’s
existing AWS footprint is different.
1055.12 -> And therefore, you might have extra
things to consider for your migration.
1058.32 -> But, the things that we're going to cover here
should be common for migrations
1061.92 -> between organisations.
1063.8 -> The most important thing we needed to do
was identify dependencies
1066.72 -> on AWS Organizations itself.
1069.04 -> Increasingly, services natively integrate
with AWS Organizations,
1072.12 -> as we have seen with the
security services.
1074.72 -> One such example is Resource Access
Manager, where you can share
1077.76 -> resources between accounts.
1079.64 -> For example, you can share a subnet
or transit gateway to all the
1082.64 -> accounts in an organisation.
1084.56 -> Another example is using the
CloudFormation StackSets feature
1087.4 -> of 'deploy to OU'.
1089.12 -> Also, organisation IDs can be used
within IAM resource policies.
1092.64 -> For example, S3 bucket policies
that allowed read access to any account
1096.24 -> in an organisation
using a condition key.
1098.84 -> These are really useful and powerful features.
1100.76 -> But what this means when you remove
the account from the organisation
1103.72 -> is that those things will most likely break,
1105.84 -> and this is likely to have catastrophic
effects on your workloads.
1109.76 -> Anything discovered like this needs
to be remediated, so there is little
1112.84 -> to no downtime when an account
is moved across the orgs.
1116.04 -> This can cause extra effort,
and meticulous planning is required,
1119 -> depending on the type
of dependency discovered.
1121.96 -> ACME were pretty sure that no one
was using any of these features,
1124.52 -> but they admitted they could not be
100% sure.
1127.24 -> Therefore, all accounts still needed
to be checked.
1130.32 -> Because there are so many services
where a dependency could be hiding,
1133 -> identifying them was going to be
a laborious task if done manually,
1136.16 -> not to mention error-prone, as it is
easy to miss something if you are
1139.52 -> hunting through settings in a console.
1141.36 -> At ACME, we were talking about
a few hundred accounts.
1144.04 -> Luckily, we’ve written some automation scripts
at AWS which we are able to use.
1148.68 -> It’s called the Organizations Dependency
Checker Tool.
1152.32 -> It’s a solution that is deployed for
a management account and by utilising
1155.6 -> a role that’s deployed to your existing
accounts, it can iterate through each
1158.8 -> of them and look for the dependencies
within them programmatically.
1162.2 -> It is not an exhaustive solution,
however,
1164.12 -> but it is being updated all the time
and does cater for a high percentage
1167.4 -> of the services you need to check.
1169.48 -> The nice thing about this solution
is that it produces an Excel spreadsheet
1172.84 -> where you can easily identify
1174.12 -> which resources you need to go
and take a look at.
1177 -> We used this to great effect at ACME,
where we did actually find some
1179.96 -> troublesome resources,
and as a consequence, had to come up
1182.8 -> with a remediation plan for each of them.
1185.44 -> Now, about alert floods.
1187.48 -> Because there had been no way to
assess compliance against standards,
1190.52 -> such as now were being implemented
via Security Hub,
1193.36 -> it turns out that some of the accounts
we were migrating at ACME
1196.56 -> were actually quite non-compliant.
1198.64 -> Because of this, we caused an alert flood
on one occasion
1201.04 -> because many of the compliance checks
immediately triggered as soon as we
1204.24 -> brought that account into
the new Landing Zone.
1206.32 -> This was not a pleasant experience
for the cyber team, and consequently,
1209.28 -> it was not a very good experience
for the platform team either.
1212.6 -> So, we improved the process
and decided that it would be a good idea
1215.92 -> to check compliance before
the migration occurred.
1218.36 -> The way we implemented this was to use
a supporting feature of AWS Config,
1221.68 -> called Conformance Packs.
1223.08 -> These are sets of Config rules
1224.32 -> that can be applied to an account
at any point in time.
1226.88 -> There are conformance packs for the
Control Tower detective controls,
1230.08 -> and also for other standards we were using,
such as the CIS Benchmark.
1233.52 -> By knowing what issues there were
pre-migration, we found it a much better
1236.88 -> experience to proactively reduce the
compliance down to an acceptable level
1240.76 -> before migration,
and therefore avert an alert flood.
1245.04 -> It soon turned out that this approach
was beneficial in more than one way.
1248.56 -> Not only were we averting alert floods,
we also found that making the results
1252.44 -> available to the account team owners,
for them to see and to remediate,
1255.68 -> gave them ownership and buy-in
to the migration project.
1258.48 -> Previously, it was the platform team
doing the bulk of the migration work
1261.56 -> and the account owners were not
being notified or even aware
1264.08 -> that their account was being migrated
until right at the last minute.
1267.6 -> So, conformance packs are a really powerful
tool, and I would recommend
1270.56 -> looking into these
for your own migration.
1272.72 -> A non-technical but nonetheless very
important thing we had to do
1276.12 -> was to update some of the administration
that was in place.
1279.32 -> In the case of ACME, they had reserved
instances and savings plans,
1282.72 -> which were helping them to reduce
their monthly bills.
1285.48 -> To continue with these discounts,
they needed to be updated
1287.8 -> with the new Organization ID.
1289.84 -> A thing to note is the effect is quite
different on these two features.
1293.28 -> Reserved Instances can be moved easily,
and they’ll keep their existing terms,
1296.64 -> but savings plans get cancelled.
1298.28 -> You’ll get a credit for the unused time,
and then that will start a new term,
1301.16 -> effectively resetting the clock.
1303.12 -> If you have anything like this in place,
I suggest you talk to the account team
1306.44 -> or support, like we did.
They’ll help you assist with your changes.
1310.28 -> Now, back to the more technical stuff.
1311.92 -> I talked earlier about how ACME were using
the policy staging OU
1315.12 -> to test out changes before they were deployed
into production.
1318.44 -> This was and still is a good idea.
1320.56 -> However, ACME wanted a completely
separate environment,
1323.36 -> where they could train up their new
team members on Control Tower
1325.96 -> and the CfCT solution.
1327.84 -> So, what they did was to create
a separate Dev Landing Zone.
1330.76 -> Remember, you can only have one
Control Tower Service activated
1333.68 -> in an organisation.
1335.04 -> So, as they had none spare,
they used a credit card
1337.16 -> to create a new account
and launch Control Tower within that.
1340.64 -> The intention was that they were
gonna use this
1342.52 -> for the duration of the migration project only.
1344.92 -> However, the cost of running
Control Tower is actually free,
1347.64 -> and it is only those underlying resources
that you pay for.
1350.6 -> ACME decided it was worth the small cost
to gain the extra agility,
1354.16 -> and the ability to give new team
members confidence
1356.48 -> before they started using the
production Landing Zone.
1359.12 -> And their Dev Landing Zone
is still in use today.
1362.24 -> After inspecting the accounts for issues,
we now had a few accounts that were
1365.36 -> non-critical and that had little to no
remediation work to be performed.
1369.12 -> We were ready to head into
the test migration phase.
1372.04 -> This is a phase where all eyes
are on an account migration as it happens,
1375.24 -> and we build out and refine
a migration runbook.
1377.96 -> A runbook is a set of repeatable
instructions to perform a task.
1381.52 -> In our case, the runbook was to contain
all the instructions required
1384.48 -> to move an account successfully
to the new Landing Zone.
1387.28 -> An example of what we used at ACME
is on the screen.
1389.88 -> It’s not a complete set but you could
use this as a base for your own.
1392.96 -> A lot of the information for the steps
was copy and pasted from the various
1396.08 -> AWS Service documentation,
making it handy to have in one single place.
1400.6 -> We included the once-off pre-requisites
and also the per account migration steps.
1405.44 -> When we moved the first account across,
we found things that we had missed out,
1408.68 -> but that was OK.
1409.76 -> We added in the missing steps
into the runbook, and this helped
1412.44 -> improve it for the next time.
1414.2 -> By the time we moved a third account
across, we found that we didn’t have
1416.88 -> anything else to add to the runbook.
1418.84 -> This was a really good indication
that we were ready to move
1421.08 -> into the core migration phase.
1423.24 -> That doesn’t mean to say
that we captured everything.
1425.32 -> The runbook is a living document,
and we kept updating it all the time.
1429.68 -> This is a tip for you and your migration.
1431.84 -> Use a runbook and iterate on it,
1433.56 -> and you’ll be much more likely
to repeat success.
1436 -> You can even use it to spot areas
where you can automate some of the steps
1439 -> using AWS APIs, rather than making it
all click ops in the console.
1443.6 -> On to the core migration phase.
1445.32 -> This was now a rinse and repeat phase
of the project where we used the runbook
1448.44 -> and migrated all remaining accounts.
1450.76 -> But what order did we migrate them in?
It wasn't random.
1454.24 -> What we did at ACME was to prioritise
all the accounts, and we did this
1457.6 -> on a set of criteria similar to the ones
on the screen.
1460.96 -> Effectively, we went from the least
complex accounts to the most complex,
1463.96 -> buying some time for the latter.
1466.36 -> Target OU suitability, ie does this account
live in non-prod or prod?
1470.96 -> We found a few that had both non-production
and production workloads within them.
1474.68 -> This made it hard to decide where the
accounts should live in the new
1477.48 -> Landing Zone, and we actually ended up
splitting them into two new accounts.
1481.24 -> As such, these accounts ended up
much lower down the priority list
1484.72 -> so there was more time
to make the changes.
1487.12 -> Conformance packs - the worst offending
accounts were pushed down the list.
1491.72 -> Organisational dependencies
were approached on a case-by-case basis -
1495.16 -> some were easy to fix, and some less so.
1497.8 -> Lastly, networking -
1499.6 -> ACME had a two-phase approach
to networking changes,
1502.08 -> but your approach might be different,
1503.68 -> so do consider if networking
weighs in on the order.
1506.32 -> Again, this is not an exhaustive list,
but hopefully it can start you on your way.
1511.08 -> And that’s a wrap!
1512.12 -> With migration in full swing,
it took a couple of months
1514.52 -> to migrate a few hundred accounts.
1516.72 -> The duration of such a project
like this is of course
1519 -> going to be dependent on the state
of your existing accounts.
1521.8 -> It’s quite possible that you are not
using any organisational features,
1525.04 -> and your accounts are in a good state
compliance-wise.
1527.52 -> In which case, it could be a much shorter time
to move to a new Landing Zone.
1531.24 -> But I don’t want you to walk away
thinking there is no effort at all.
1534.72 -> It should be a well-planned out project
following the above phases at least,
1538.32 -> and take your time at the beginning
of the project
1540.04 -> so that you can reap the benefits later on.
1542.48 -> After the project concluded at ACME,
we went from three disparate collections
1545.8 -> of accounts that as a platform caused 15
medium and high risks to be called out
1550.24 -> as part of the security assessment.
1552.24 -> After creating a new Control Tower
Landing Zone, it went to this:
1556.52 -> A Single AWS Organizations
for all their accounts,
1559.68 -> and using Control Tower
and other AWS Services,
1562.64 -> ACME were now confident that any account
within this Landing Zone was compliant
1566.48 -> and in line with their organisational
policies.
1569.04 -> We fixed all 15 of the security findings.
1571.4 -> Overall a huge success, and consequently,
because of the single pane of glass
1575.2 -> management approach, their cloud team
now has a much easier job
1578.36 -> of managing the platform.
1580.4 -> So, what do I want you to take away
from the session?
1582.84 -> Hopefully, from following the journey
we had at ACME,
1585.44 -> you can see what benefits an enhanced
Control Tower Landing Zone can bring.
1589.56 -> We talked about the drivers as to why
you might want to make the switch.
1592.68 -> In the case of ACME, it was discovering
their security posture was suboptimal.
1596.88 -> In your case, you might not need anything
as formal as a security assessment,
1600.52 -> but do take some time to assess
your single pane of glass view
1603.72 -> and central management capability
of your platform.
1606.72 -> If you find that you do not have one
or only for certain aspects and not others,
1610.2 -> it could be worth considering a migration
to a Control Tower Landing Zone.
1613.88 -> I talked through each of the four phases
we followed,
1616.24 -> but I’d like to stress the importance
of the Landing Zone design phase.
1619.92 -> This is your chance to do things
according to best practice,
1622.88 -> and there are lots of areas to consider.
1624.56 -> So, don’t skip on this one,
and it will help you down the line for sure.
1628.56 -> Lastly, I highlighted a few of the tools
and processes that we used at ACME.
1632.28 -> And I recommend to look into these
as they will greatly help
1634.84 -> speed up your migration.
1636.88 -> Now, you might have noticed a 'top tips'
logo on some of the slides.
1640.08 -> If you didn’t, hit rewind,
and go and have a look.
1642.44 -> They are the things that I think
could really help you.
1644.92 -> For convenience, I’ve put QR codes here
that will take you straight
1647.2 -> to the official documentation
of three of them.
1649.48 -> I am pretty sure you will find them useful.
1651.6 -> I will also be publishing a blog
on this migration topic very soon,
1654.64 -> so keep an eye out for that one.
1656.6 -> And in addition to those specific
Landing Zone links,
1659 -> there is also a vast trove of training resources
for you to officially
1662.2 -> skill-up on your cloud journey.
1663.76 -> And you can bookmark these
at your leisure.
1665.72 -> And that’s it, thank you for listening!
1667.8 -> I hope you found it useful, and I wish you well
on your cloud migration journey,
1671.16 -> should you choose to accept it.
1673.08 -> One last thing,
1674.08 -> I'd really appreciate it if you could fill out
the session survey
1676.76 -> as feedback is always welcome
to improve our talks.
1679.52 -> Thank you!
Source: https://www.youtube.com/watch?v=L0cJPmkFDg8