AWS re:Invent 2020: Choosing the right modern deployment strategy

AWS re:Invent 2020: Choosing the right modern deployment strategy


AWS re:Invent 2020: Choosing the right modern deployment strategy

There are a variety of modern approaches to automating deployment that are popular on AWS. But should you do a blue/green, canary, rolling, or other deployment? And what is the best method to implement the strategy you choose? In this session, explore the options to help you choose and see how to implement the more popular strategies on AWS.

Learn more about re:Invent 2020 at http://bit.ly/3c4NSdY

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

#AWS #AWSEvents


Content

1.52 -> hey everybody my name is andrew baird
3.439 -> i'm a principal solutions architect with
5.12 -> aws and welcome to devops 303
7.759 -> where we're going to talk about how to
8.96 -> choose the right modern deployment
10.32 -> strategy it's a topic i'm really
11.599 -> passionate about we're going to cover
13.28 -> all sorts of different deployment
14.719 -> mechanics and options you have on aws
17.039 -> regardless of the
18.16 -> the compute service that your
19.199 -> application has been deployed to um so
21.199 -> let's jump in we've got 30 minutes
23.199 -> so i'm going to go through the a general
26 -> refresher first so it's going to give
27.599 -> you some kind of baseline terminology so
29.519 -> that we're on the same page about cincd
32.239 -> then i'm going to describe some overall
34.32 -> tenants some goals you know some
35.92 -> foundational things that your deployment
37.6 -> should be aware of when you're defining
39.12 -> what your deployment strategy is
41.04 -> and then i'm going to talk through what
42.96 -> i'm calling considerations these are
44.879 -> kind of within any deployment strategy
47.12 -> these are some kind of
48.559 -> core options that you have to choose
50 -> between and your preference in those
51.44 -> considerations may help you choose which
53.36 -> option is right for you
54.64 -> and then i'm going to go through the the
56 -> specific actual deployment options
57.76 -> available for you
59.199 -> these modern deployment options that are
60.559 -> available on the aws platform
62.399 -> and some details about how to implement
64 -> each of them on top of aws
66.88 -> uh before wrapping it up all together
68.4 -> and kind of you know reminding you the
70.159 -> things we talked about already
71.84 -> hopefully you're thinking about your own
73.28 -> application throughout those
75.2 -> those initial slides and then when we
76.479 -> wrap it all together you'll know which
77.759 -> one might be the best fit for you
79.92 -> so let's jump in so first again we're
82.72 -> going to refresh some general
83.84 -> terminology about cicd
85.759 -> we've broken the uh software development
87.92 -> life cycle into these four phases
90 -> source build test and production and
91.84 -> there's some specific activities that
93.28 -> are typically happening
94.32 -> at each one of those phases sources you
96.64 -> know the active development that's
97.759 -> happening and when the code gets checked
99.36 -> into a repository
100.64 -> some human processes that are generating
102.56 -> the code reviewing the code
104.24 -> but nothing's running yet it's just the
105.92 -> you know the code related
107.36 -> mechanisms that you have and then the
109.92 -> build process is taking that code and
111.92 -> actually compiling it making sure that
114.159 -> the style meets your standards that the
116.479 -> you know code coverage from a test
118.079 -> perspective is is there and meeting
120.079 -> whatever policies you set forth
122.079 -> and that the code that's been written is
123.6 -> able to be compiled into some type of
126.079 -> deployable unit you've got
127.759 -> an artifact or a set of artifacts that
130 -> have been
130.8 -> compiled together and able to be
132.4 -> deployed now
133.84 -> and then a test phase where you take
135.44 -> that deployable artifact and you
136.959 -> integrate it into
138.239 -> a running environment there's other
139.76 -> maybe services that are going to
141.52 -> integrate with this deployable unit
143.36 -> you've just deployed
144.72 -> make sure that those integrations
146.08 -> between them are are working that
148 -> many dependencies you have or things
150.56 -> that depend on you
151.599 -> are happy with the changes that have
152.959 -> been made to this new deployable
154.4 -> artifact
155.36 -> and just go through a slew of different
156.8 -> testing to give your team and your
159.2 -> business confidence that when it gets
160.72 -> deployed to your customers or into the
162.239 -> production environment
163.28 -> that things are going to be successful
165.44 -> and they're going to be secure
166.72 -> and then finally once you've been
168.08 -> satisfied that all the testing is given
169.92 -> the confidence you require
171.519 -> you're going to take that deployable
172.72 -> unit and put it into a production
174 -> environment whatever that means for you
175.76 -> and it's going to be you know getting
177.12 -> access by your customers is going to be
179.12 -> something your business is running on
180.64 -> top of and then continually you're going
183.2 -> to be you know monitoring the success of
185.12 -> that code that you've written
186.72 -> through various metrics and monitoring
188.48 -> available and each one of these phases
190.879 -> is all about creating those feedback
192.4 -> loops every one of these
194.239 -> different phases of the sdlc and the
196.239 -> activities that happen within them
198.08 -> are about creating feedback loops so
199.76 -> that you're able to catch problems early
202.239 -> to identify fixes early you can
205.28 -> gather insights about the way the
206.799 -> application is behaving or the way your
208.4 -> users are interacting with it
210.319 -> so uh building in feedback loops
212.48 -> throughout um and aws has got
214.72 -> a slew of offerings a slew of services
217.12 -> that uh provide capabilities for each
219.2 -> one of those individual moments of the
221.04 -> sdlc all the way from
222.959 -> code repositories with our code commit
224.72 -> service and the ide
226.239 -> to to help you develop code that's going
227.92 -> to be contributed there on our cloud 9
229.599 -> service
230.72 -> all the way through the actual
232.159 -> mechanisms of deployment and subsequent
234.08 -> monitoring inside of
235.36 -> your production environment and all your
236.64 -> other environments there within so a
238.4 -> civil service is available here
240.08 -> i'm not going to go through all of them
241.2 -> today obviously but just know that for
244.159 -> each one of those moments in the sdlc
246.08 -> you've got
247.12 -> service native capabilities on aws to
250 -> you know bring
251.2 -> additional automation enhancements
254 -> better visibility transparency
257.359 -> throughout your application's life cycle
260.56 -> but really today since we're talking
262.079 -> deployment the service we're mostly
264 -> going to focus on the context of is aws
266.08 -> code deploy
266.96 -> um so code deploy is the service that
268.8 -> helps take those deployable artifacts or
271.12 -> those changes that need to occur in a
272.639 -> running environment and helps
274.16 -> instrument uh those changes taking place
277.199 -> um so it's available uh at any scale
280.639 -> this this is you know whether your
281.84 -> application is
282.72 -> running on a single server um or as a
284.96 -> single container or it you know is
286.8 -> comprised of tens of thousands of
288.32 -> servers
288.96 -> it's a fully scalable service um and
291.36 -> it's able to support
292.479 -> applications that have been deployed on
294.8 -> you know these various compute types
296.24 -> that exist these paradigms where your
297.68 -> application could be running
299.12 -> on servers it could be running on
300.639 -> containers it could be
302.16 -> running serverlessly as part of aws
303.919 -> lambda and code deploy provides
306.96 -> these programmatic mechanisms to make
308.8 -> deployments be safe
310.24 -> and automated regardless of what the
313.52 -> compute paradigm is that your
314.8 -> application's running within and
317.12 -> a slew of different hooks to you know
318.88 -> allow you to decide what types of tests
321.039 -> should occur when testing
323.12 -> what the behavior should be when those
324.32 -> tests succeed or fail
326.56 -> the monitors that the deployment process
328.72 -> should be aware of
329.84 -> to know when rollback should occur just
331.44 -> a slew of different features
333.6 -> related to each of those things that
334.96 -> happened within the the active
336.639 -> deployment of an application
338.639 -> so that's code deploy um and we'll talk
342.08 -> about
342.479 -> uh once we get to the different options
344.24 -> how code deploy relates to those
345.6 -> different options that are available but
347.28 -> but first i'm going to take a step back
348.56 -> again to talk about kind of the general
350.08 -> tenants
350.88 -> that are that you should be thinking
352.639 -> about when you're pursuing a modern
353.919 -> deployment strategy it's not just about
355.6 -> automating deployment for the sake of
357.199 -> automation in itself
358.639 -> there's things you should be striving
360.16 -> for from a you know a business
362.16 -> perspective or a a team perspective um
365.84 -> to to make sure that those deployments
367.28 -> are successful and safe
369.28 -> so these are the tenants i want to
370.4 -> highlight that i think are true for
372.16 -> every modern deployment that exists
373.68 -> today
374.4 -> um maybe you've got you know a way that
376.16 -> these are going to be unique to your
377.28 -> specific application but we think these
378.72 -> are
378.96 -> these are pretty universal um there
380.56 -> should always be a goal within
382.639 -> uh modern deployment that there's not
385.199 -> going to be any disruption to the
386.319 -> business clearly the
387.44 -> the money shower is turned on you're
388.96 -> generating revenue and this guy's very
391.28 -> happy smile would look a lot less so if
393.52 -> your deployment causes
394.96 -> orders to stop on your website or for
397.44 -> your your
398.4 -> website visitors to no longer be able to
400.16 -> utilize a feature that's important to
401.6 -> the money shower
402.639 -> um so making sure there's no disruption
405.84 -> is clearly priority number one any type
408.479 -> of
408.88 -> modern deployment that requires downtime
410.96 -> or business disruption is clearly not
412.96 -> going to
413.919 -> be as good as it could be next making
416.639 -> sure they're iterative and frequent
419.36 -> the more you're able to reduce the risk
421.039 -> of a deployment by making changes
422.88 -> smaller
423.919 -> you're able to you know ensure that the
426.24 -> types of
427.12 -> um you know possible bugs that could
428.96 -> exist are able to be
430.4 -> you know you're able to have a really
431.599 -> narrow scope of investigation for the
433.199 -> types of changes that occurred
434.88 -> and by doing so and by making them
437.039 -> iterative you're able to make them more
438.479 -> frequent because
439.84 -> they're able to be done in smaller
441.039 -> batches you know quicker development
442.639 -> cycles that result in deployment to
444.16 -> production
445.039 -> means you can deliver features for your
446.56 -> business partners faster for your
448.16 -> product managers faster and all those
449.599 -> things so
450.319 -> striving to be iterative and frequent in
451.919 -> your deployment is important
453.599 -> and one of the things that enables that
455.039 -> is having really hardened versions so
457.28 -> whether you're thinking about specific
459.599 -> you know image versions of docker
460.96 -> containers
462.4 -> or you know named versions of a of a you
464.96 -> know
466 -> a commit against a code repository um
468.96 -> the ability to kind of treat those
470.4 -> things not just as versions on their own
472.16 -> but as a holistic version of your
473.68 -> application so that
475.039 -> um should you need to do a rollback in
476.8 -> the future or go back to some other
478.4 -> prior state
479.84 -> or be able to assert what what the
482.08 -> future state is going to be so that you
483.68 -> know dependencies that you have or that
485.36 -> depend on you
486.319 -> are speaking in the same terms of what
488 -> those versions mean
489.52 -> you talk about a holistic version of an
491.599 -> application deployment so that
493.68 -> all of those dependencies therein are
496.639 -> following the same type of versioned
498.879 -> process so that if i'm on version two
501.599 -> today and i need
502.56 -> to roll back to version 1.9 1.9 brings
506 -> with
506.24 -> not just the version of my code that's
507.44 -> running but any other dependencies it
508.96 -> might have had within the operating
510.319 -> system
511.44 -> other services i'm depending on perhaps
513.2 -> even but you're able to talk about these
515.2 -> hardened versions that
517.039 -> that you know give you confidence about
518.839 -> state and and the state of a deployment
521.839 -> um clearly i don't need to say much more
523.519 -> about automation and that how these
524.959 -> deployments should be automated no
526.24 -> operations team or development team
528.88 -> clearly wants to have to care and feed
530.64 -> for a deployment while it's occurring
532.64 -> so making sure they're automated is
533.92 -> super important and then uh
536.08 -> auditability something that often gets a
537.76 -> little forgotten and is not just
539.44 -> important during
540.64 -> um you know audit in the the security
542.56 -> sense so that you can see who'd made
544.08 -> changes and when they occurred but
545.6 -> in the operational sense too um so that
547.839 -> when you need to find the the the
549.519 -> smoking gun
550.24 -> so to speak of when a bug occurred um
552.48 -> and what's causing it within an
553.68 -> environment
554.399 -> um a good solid audit trail that's you
556.48 -> know being preserved and is available
559.04 -> and all of your log analysis tools and
560.64 -> your operational uh
562.24 -> logging tools gives you that ability to
564.8 -> dive in really fast and understand where
567.279 -> um you know errors might have begun
569.279 -> within an environment which deployment
570.8 -> they might be associated with
572.48 -> and thus what code changes you know
574.32 -> underneath the covers or configuration
575.92 -> changes may have been related to
578.24 -> that audit trail and that trail of
579.839 -> breadcrumbs you're looking at
581.92 -> so these are the deployment tenants
583.12 -> we're talking about again they kind of
584.72 -> all feed into each other but the goal is
586.399 -> to have
586.959 -> a really nice and clean safe automated
588.88 -> deployment so that you can iterate fast
590.959 -> you can iterate with business confidence
592.64 -> and that you're not requiring a lot of
594 -> you know manual man hours in order to
595.68 -> achieve that
596.48 -> um with with the pace of innovation and
598.24 -> how often deployments are happening uh
599.839 -> in a modern application
602.48 -> so now i'm going to talk about
603.36 -> considerations so these are like i said
605.36 -> regardless of which option you choose
606.959 -> there's some
607.76 -> kind of core decisions you'll be making
610.24 -> about how you treat your infrastructure
612.079 -> during a deployment
613.279 -> and how that might inform which
615.44 -> deployment options are best fit for you
617.839 -> so the first thing i'm going to talk
619.36 -> about is metrics tests and alarms so
622.16 -> this is bare bones no matter which
623.519 -> option you choose
624.56 -> this is this should be kind of seen as a
626.079 -> prerequisite to having a good modern
627.839 -> deployment approach
629.36 -> just simply implementing automate
631.6 -> automated deployments on their own
633.2 -> without having
634.079 -> all three of these things you know a
636.56 -> mature approach to these three things
638.24 -> already baked into your application
640 -> is going to lead potentially to some
641.44 -> pretty bad failures from a deployment
643.279 -> perspective
644.64 -> you can only trust your deployments as
646.399 -> as as much as your metrics give you
648.079 -> visibility into what the real health of
649.6 -> your application is
650.959 -> uh you're only going to have confidence
652.88 -> that the deployment's going to be
654 -> successful if your test suites
655.839 -> um are covering enough in terms of
657.839 -> functionality
659.36 -> to know that you're you're checking the
661.12 -> right things before your customers are
663.36 -> the real test cases and that alarms
665.519 -> exist so that when those deployment
667.44 -> uh you know potentially those
669.36 -> deployments potentially go wrong or bugs
671.279 -> exist
671.92 -> and metrics now show you know
673.36 -> fluctuations that maybe aren't
675.04 -> um aren't what they should be um alarms
677.12 -> are able to catch those things at the
678.48 -> right thresholds
679.68 -> and you know notify deployment
681.12 -> mechanisms that rollback should occur
683.04 -> um quickly and safely before your
684.72 -> customers are really you know having
686.24 -> having a lot of pain for a long duration
688.16 -> and we provide you a lot of these things
689.519 -> out of the box and really easy ways to
691.279 -> take advantage of them using built-in
692.88 -> metrics for
694.079 -> a lot of our services almost all of our
695.519 -> services have a slew of built-in metrics
697.6 -> that are relevant for operations
699.44 -> but the idea is to not only depend on
701.36 -> those it's important to use those but
703.2 -> you know your business context and your
705.04 -> application context best
706.56 -> and there's going to be metrics that you
707.92 -> have to gather related to your own
709.6 -> applications that we're not going to
710.88 -> give you out of the box so things like
712.8 -> how many orders are completing as part
714.32 -> of your ecommerce website
716.079 -> how many users are are visiting a
718.24 -> specific page within your application
720.8 -> how many errors are being generated by a
722.639 -> specific line of code within your
724.16 -> application that is important maybe
726.16 -> related to revenue being generated those
728.48 -> are the types of things that live inside
730 -> your application that you should be
731.36 -> generating metrics on and creating your
733.12 -> own tests around
734.079 -> your own alarms around so regardless of
736 -> what deployment methodology
737.44 -> you end up choosing these are bare bones
739.68 -> requirements that all of them will will
741.519 -> really require
743.6 -> um a newer feature it's it's it's been
746.32 -> uh released as part of cloudwatch that i
747.92 -> want to point people to that might not
749.12 -> be taking advantage of it already
750.72 -> um that just released this year is a
752.639 -> cloudwatch composite alarm so this is
754.56 -> the ability to take
755.92 -> a single cloudwatch metrics and alarms
757.839 -> and aggregate them into these logical
759.92 -> conditional statements like i have
761.519 -> highlighted on the right of the slide
762.88 -> here
763.519 -> um and combine them into single alarms
765.519 -> and and really for
766.72 -> um any type of rollback um any alarm
769.76 -> that's going to inform a rollback
770.959 -> decision or
772 -> give you a sense of the overall health
773.6 -> of your application
775.04 -> it really should be one of these
776.16 -> aggregate alarms right that's that's the
778.079 -> the kind of model we follow here at aws
779.92 -> too where you have a slew of different
782.16 -> metrics that
782.88 -> talk about the overall health talk about
784.639 -> specific health parameters within your
786.399 -> application like
787.68 -> you know latency or the number of errors
789.76 -> that are occurring or how many requests
791.279 -> you're receiving or
792.56 -> uh cpu utilization a slew of different
794.48 -> things and if any one of those metrics
796 -> is out of balance it could cause
797.76 -> uh you know a need for an alarm to fire
799.839 -> and taking advantage of composite alarms
801.839 -> lets you create
802.959 -> those really nice coarse-grained uh you
805.92 -> know
806.399 -> uh alarm statements that you know
808.88 -> there's something going wrong
810.079 -> it could be any one of these individual
811.6 -> things but rather than have to
813.279 -> you know have a bunch of fine-grained
814.88 -> alarms that exist way down at the
816.48 -> individual metric level you can
817.76 -> aggregate them together into this
819.519 -> full picture of health within your
820.88 -> application and uh and take advantage of
823.44 -> composite alarms if you're not familiar
824.8 -> with composite alarms we're going to
826.399 -> talk deployment a lot more don't worry
827.68 -> but hopefully this is a nice little
829.279 -> um treat an extra bonus piece of
830.8 -> knowledge you can take with you so take
832.079 -> a look at composite alarms and
833.199 -> cloudwatch
834.8 -> uh the next the next consideration
836.399 -> that's going to help inform what
837.36 -> decisions you make for deployment is
838.88 -> whether you want to pursue mutable
840.24 -> versus immutable infrastructure
842.399 -> there's been a big push among a lot of
844.24 -> our customers for immutable
845.199 -> infrastructure because there's a lot of
846.32 -> benefits for it but there might be good
847.68 -> reasons why
848.88 -> immutable infrastructure doesn't make
850.48 -> sense for your application there's some
851.76 -> pros and cons for both
853.36 -> so what this really refers to immutable
855.36 -> infrastructure means that after
857.12 -> a a a deployment has occurred a piece of
859.839 -> infrastructure has been created
861.68 -> and is active within the environment
863.199 -> nothing can change it again and that
864.88 -> means
865.279 -> you know no human access to the
866.72 -> operating system no deployment of
869.12 -> configuration changes
870.56 -> to you know a live server uh
873.6 -> no deployment artifacts can change at
875.44 -> all once that piece of infrastructure is
877.44 -> out and open in the environment and
878.959 -> serving its purpose in the in the
880.399 -> architecture
881.36 -> it can't change anymore nothing can
882.72 -> access it and nothing can change it
884.8 -> whereas mutable is obviously the the
886.959 -> inverse of that where
888.16 -> i'm a running server you still have the
889.76 -> ability if you need to change something
891.36 -> in place
892.639 -> access that server for some reason
895.12 -> you'll be able to do that
896.56 -> so i've got some pros and cons
897.68 -> highlighted here um on the mutable
899.44 -> infrastructure side
900.639 -> uh we find that you know if you're the
902.8 -> type of environment where your
904.16 -> operational processes for whatever
905.76 -> reason
906.48 -> really favor the the um you know have a
908.72 -> culture of favoring hot fixes
910.56 -> of you know having folks jump into the
912.959 -> inactive server or deploying
914.32 -> configuration changes to the active
915.839 -> environment quickly
917.12 -> um for whatever reason you know that's a
918.72 -> it's a pretty dangerous um thing from a
920.48 -> security perspective but there might be
921.68 -> a reason why you have to do it
923.519 -> encouraging mutable infrastructure is is
925.44 -> kind of a requirement you want to be
926.8 -> able to you know quickly jump on a
928.16 -> server or make a change
929.6 -> um whereas the alternative to that on
932.399 -> the
932.72 -> operational side is there might be a lot
934.079 -> of complexity because you're allowing
936.16 -> changes to occur outside the normal
937.92 -> boundaries of
938.88 -> automation or outside the you know the
941.759 -> coarse grained activities of
943.04 -> provisioning infrastructure or
944.8 -> um you know new pieces of infrastructure
946.639 -> that are you know can trust
948.399 -> can be trusted to be hardened uh so to
950.32 -> speak um so it could make the
951.839 -> investigations
952.8 -> you know more complex um whereas on the
955.6 -> immutable side
956.72 -> uh you can have a lot of confidence that
958.24 -> this piece of infrastructure hasn't
959.6 -> changed and the behavior
960.88 -> that you're seeing is associated with
962.88 -> you know that that
963.92 -> you know all of those things that are
965.12 -> baked into it already and there hasn't
966.56 -> been any additional changes that have
967.839 -> occurred to it since
969.6 -> which is you know able to you know
971.44 -> simplify a lot of times you identifying
973.44 -> when the
974.16 -> problem or root cause may have started
975.839 -> occurring within your environment
977.6 -> um a couple other change things i'd like
979.6 -> to highlight is the
981.199 -> the cost difference if you deploy very
983.839 -> frequently or you have a reason to have
985.519 -> a very large amount of infrastructure
987.44 -> running alongside
989.199 -> each other during a deployment running
990.72 -> immutable infrastructure could be cost
992.24 -> prohibitive
994.079 -> or run into scaling concerns that you
995.759 -> might have whereas
997.279 -> immutable infrastructure uh because
999.279 -> you're able to make those changes in
1000.56 -> place
1001.199 -> uh may be more cost effective for the
1002.88 -> way your application deployments occur
1005.759 -> but it also means you might be able to
1007.12 -> roll back really quickly too because you
1008.8 -> don't need to reprovision
1010.079 -> infrastructure anymore and wait for
1011.36 -> servers to come up potentially if
1013.12 -> you're dependent on servers um okay so
1016.48 -> let's move forward and talk about what
1017.92 -> the
1018.24 -> the actual options of modern deployment
1020.56 -> might be so the first one i'm going to
1021.68 -> highlight is called rolling deployment
1023.199 -> or linear deployment
1024.48 -> so this is i've got a running
1025.6 -> application and i'm going to
1027.12 -> incrementally increase
1028.559 -> uh the percentage of of the environment
1031.839 -> that's running the new application
1033.679 -> in comparison to the old application so
1036 -> a lot of reasons folks like this is it
1037.679 -> adds
1038.079 -> risk incrementally we're going to be
1040.799 -> deploying to
1041.679 -> you know small units of infrastructure
1043.28 -> one by one uh
1044.72 -> it limits how many changes are happening
1046.48 -> concurrently if something goes wrong i
1048.079 -> can detect it pretty quickly and roll
1049.84 -> back
1050.64 -> without the entire environment having
1052.4 -> been changed and it lets me reuse the
1054.559 -> infrastructure that's already there in
1055.84 -> the environment
1056.72 -> um some cons of this is sometimes uh if
1059.2 -> you have a very large number of servers
1061.36 -> or
1062.16 -> application components and you're a very
1063.679 -> risk-averse organization might take a
1065.6 -> really long time for a linear deployment
1067.36 -> to come to complete
1068.88 -> you know if you want to deploy one
1070 -> server at a time over you know 100
1071.76 -> servers
1072.48 -> and it's going to take several minutes
1073.84 -> for that deployment to occur those
1074.96 -> things can add up pretty pretty quickly
1076.799 -> and it could be a really long time
1078 -> before the deployment is considered
1079.44 -> successful
1080.64 -> uh on the same you know the same tone
1082.96 -> the the rollback
1084 -> can take just as long a period of time
1085.919 -> it can obviously change your
1087.2 -> your strategy of how quickly you might
1088.64 -> want to roll back um rolling back at a
1090.64 -> higher percentage rate than you did on
1092.32 -> the way forward but
1093.76 -> there might be reasons why you can't
1095.12 -> maybe you're you know tightly managing
1096.88 -> how connections to dependencies work and
1098.96 -> you need to make sure that
1100.16 -> you know it's a it's a nice gradual
1101.919 -> rolling change rather than
1103.36 -> opening a floodgate on one side or the
1105.12 -> other so in those rollback cases it
1107.039 -> might
1107.6 -> lead to really long rollback times and
1110.08 -> this idea of a heterogeneous environment
1112 -> can be
1112.4 -> a complex thing to manage and deal with
1114.24 -> as well in the midst of a deployment
1116.32 -> you have a single environment that is
1117.84 -> running multiple versions of your
1119.039 -> application
1120.4 -> they could be interacting with you know
1122.24 -> a single set of dependencies and
1123.84 -> creating
1124.799 -> um an operational investigation scenario
1127.52 -> where odd things are maybe happening
1129.12 -> with how state is being stored
1131.12 -> um how object definitions are changing
1133.44 -> and your you know your various
1134.88 -> dependencies need to be aware that i've
1136.48 -> got multiple
1137.44 -> uh versions of my application running at
1139.12 -> the same time and how traffic gets
1140.559 -> routed to each version that's running
1142.24 -> just an extra layer of complexity this
1144 -> idea that you've got multiple
1145.76 -> versions running within within one
1147.36 -> environment okay
1149.44 -> um so if if this is the right you know
1151.44 -> type of approach for you there's a
1152.72 -> couple different ways you're going to
1153.6 -> implement it so for the ec2 service
1155.6 -> um if you're using code deploy as your
1157.2 -> deployment mechanism um
1158.799 -> i've got cloud formation snippets here
1160.48 -> and yaml kind of highlighting
1162.4 -> where the various properties associated
1164.72 -> with this with the configuration of code
1166.559 -> deploy relates to
1167.919 -> choosing a linear or rolling deployment
1169.919 -> um so this idea of minimum healthy hosts
1172.48 -> it's it's basically informing code
1173.919 -> deploy that i always want to make sure
1175.679 -> that 90
1177.039 -> of my hosts are healthy and if they're
1179.36 -> in the middle of a deployment we kind of
1180.72 -> consider them unhealthy right because
1182.08 -> they're not able to actively serve
1183.36 -> requests because the deployment's in the
1184.72 -> midst of happening
1185.84 -> so you're able to define what is the
1187.2 -> minimum percent that are that are still
1189.039 -> stable and healthy and serving traffic
1191.2 -> on the old version or the new version
1192.799 -> and what percent therefore
1194.48 -> is able to be taken down to have a
1196.48 -> deployment occur against that that piece
1198.16 -> of infrastructure
1199.919 -> um so that percentage between between uh
1202.24 -> you know one and one and a hundred
1203.919 -> will define uh how many are zero and 100
1206.64 -> will define
1207.44 -> uh how many hosts are able to be taken
1209.28 -> offline and how quickly that rolling
1210.64 -> deployment occurs
1211.76 -> um the second box i have below is
1213.52 -> related to load balanced applications
1215.52 -> where code deploy will help take control
1218.08 -> of
1218.88 -> registering and deregistering instances
1221.2 -> from their their uh
1222.48 -> their network load balance or
1223.76 -> application load balancer um so that
1225.919 -> traffic is routed appropriately as
1227.6 -> versions change
1228.96 -> and then on the on the right side uh
1230.88 -> right side of the slide
1232 -> you can actually use auto scaling as a
1233.36 -> deployment mechanism as well you don't
1234.64 -> need code deploy for this
1236.48 -> type of deployment mechanic where as you
1239.28 -> um
1239.679 -> use auto scaling you can design design
1242.559 -> another launch configuration where
1243.84 -> you've got a new server image that's
1245.28 -> going to be introduced into the auto
1246.88 -> scaling group and auto scaling itself
1249.28 -> will roll that new uh launch
1251.6 -> configuration image into the group
1253.76 -> at the rate with which you're you're
1255.6 -> defining here for what the batch size of
1257.44 -> that rolling update should be
1259.679 -> and what type of cool down and pause
1262.799 -> times exist
1264 -> between updates that are occurring
1265.84 -> within the group so you can use auto
1267.28 -> scaling even to achieve
1268.72 -> um the the rolling deployment type
1270.32 -> within ec2
1271.76 -> um if you're using our ecs service for
1274.32 -> container-based applications
1276.559 -> you have a property called deployment
1278.24 -> configuration where similar to the
1280.32 -> you know the code deploy option within
1282.64 -> ec2
1283.6 -> uh this one will be about you know the
1285.44 -> maximum and minimum percentage of
1287.039 -> healthy containers that are running as
1288.559 -> part of your service
1290 -> where you can inform code deploy in this
1292.4 -> case i want to make sure that
1294 -> my desired number of tax tasks is always
1297.44 -> running at 100
1298.64 -> and never less than it so i can satisfy
1300.24 -> the traffic demand i expect
1301.679 -> but i'm willing to go over that amount
1303.28 -> by 10 up to 110
1305.12 -> so that ecs will be introducing another
1307.6 -> 10
1308.32 -> of of uh of the new version of that
1311.76 -> of that image of my container into the
1313.6 -> service in a rolling way
1316 -> and then if you're running serverlessly
1317.76 -> and want a rolling or linear deployment
1319.36 -> there is a property as well available
1321.2 -> for your lambda function
1322.48 -> um called deployment preference uh where
1324.4 -> you define the type and there's a
1326 -> specific named types uh of
1329.44 -> of deployment preference available to
1331.039 -> you by the lambda service one of them is
1332.799 -> a linear option where you define
1335.039 -> the percent of which you'd like that
1337.039 -> linear deployment to occur
1339.919 -> over what period of time so i want an
1341.52 -> additional 10 percent of traffic shifted
1343.6 -> to the new version of my lambda function
1345.2 -> the new alias
1346.48 -> if you're familiar with um
1349.52 -> serverless deployments using lambda it's
1351.039 -> all based on alias how traffic gets
1352.559 -> shifted
1353.36 -> uh there'll be a 10 shift to your new
1355.2 -> alias every three minutes so every three
1356.88 -> minutes from ten
1357.84 -> three minutes to twenty three minutes to
1359.36 -> thirty so on and so forth until you
1361.2 -> reach a hundred percent
1362.4 -> um so you've got the option in all three
1363.84 -> of those servers containers and
1365.84 -> serverless to to achieve the rolling
1367.36 -> deployment style
1368.88 -> next is the the blue green deployment so
1370.96 -> blue green is about provisioning
1373.12 -> a net new infrastructure set that's
1375.679 -> running the new version of your
1376.799 -> application that's going to exist
1378.08 -> alongside
1379.28 -> the application version that's uh the
1381.76 -> the infrastructure that's running your
1383.039 -> old version of your application so i
1384.559 -> have
1385.12 -> here a a blue version starting on the
1387.12 -> left side
1388.64 -> i'm going to provision a green version
1391.039 -> and where the blue version is receiving
1392.88 -> traffic that arrow coming from above is
1394.64 -> representing requests
1396 -> incoming to my application there's going
1398 -> to be a period of time where
1399.6 -> both of them are running simultaneously
1401.44 -> and i'm able to
1402.88 -> cordon off a small percentage of traffic
1405.28 -> or maybe it's just test traffic and it's
1406.88 -> not even live traffic
1408.159 -> but i'm able to send requests to that
1410 -> green stack and the green stack is
1411.36 -> receiving
1412.24 -> uh receiving those requests and able to
1414.72 -> uh you know we're able to get more
1415.919 -> confidence that this newly provisioned
1417.44 -> green environment is behaving
1418.64 -> healthfully
1419.52 -> um and and we're confident that now the
1422.799 -> blue traffic can be shifted to green
1424.799 -> uh and we've made that shift uh one set
1426.96 -> of images over and you see only traffic
1428.72 -> being sent to the green stack now
1430.48 -> and the blue stack remains available for
1432.24 -> some period of time
1433.6 -> such that if we decide a rollback needs
1435.2 -> to occur all i need to do is shift
1436.88 -> request traffic usually through dns or
1438.799 -> some mechanism like it
1440.159 -> uh in service discovery if you're if
1442.08 -> you're running microservices environment
1444.159 -> where you're going to shift traffic back
1445.44 -> to that blue
1446.559 -> stack that's still up and running and
1447.84 -> it's going to give you a really quick
1448.799 -> roll back experience
1450.4 -> but if everything went smoothly
1451.679 -> eventually we'll be we'll be confident
1453.12 -> we can get rid of the blue stack
1454.64 -> and spin it down and we're just left
1456.159 -> with the new green version of our
1457.44 -> application
1458.559 -> so some pros here by taking this kind of
1461.12 -> whole infrastructure approach
1462.48 -> approach i'm able to keep my environment
1465.52 -> consistent as
1466.48 -> it travels through my various life cycle
1468.32 -> environments i i always produce
1470.72 -> a new full set of infrastructure and i
1473.36 -> can be
1473.919 -> you know very confident that that entire
1475.84 -> set of infrastructure
1477.2 -> is going to be you know self-sufficient
1479.279 -> to satisfy the application and any tests
1481.44 -> that i run against it
1482.88 -> are going to be the same it's going to
1484.159 -> be against the same environment
1485.919 -> that my customers are having their
1487.2 -> requests routed to eventually there's no
1489.039 -> there's no in-place changes that are
1490.799 -> occurring um that
1492.159 -> you know maybe if there's you know
1493.679 -> external variables that affect the way
1495.2 -> that automation occurs
1496.48 -> um i don't have to worry about that in a
1497.84 -> blue-green deployment because it's
1498.96 -> always a fresh set of infrastructure
1501.039 -> and that deployment mechanism can be
1503.039 -> really fast downsides is
1505.12 -> because you're going to operate these
1506.64 -> multiple environments for some period of
1508.24 -> time
1508.799 -> you're going to potentially incur more
1510.48 -> costs depending on how long they stay up
1512.08 -> and running and how large the
1513.2 -> application environment is and how much
1514.64 -> it costs
1515.679 -> the other downside is hotfix can be a
1517.919 -> applying hotfixes can be a really
1519.76 -> difficult task and potentially slow
1522.4 -> because
1523.2 -> if you wanted to go from blue to green
1524.88 -> but not necessarily roll back to blue
1526.559 -> but an
1527.279 -> instrument a very quick hotfix new
1529.679 -> deployment change to the environment you
1531.2 -> can't just
1531.76 -> deploy that quickly into the green
1533.279 -> environment if you're running immutable
1534.48 -> infrastructure it means you're gonna
1535.44 -> have to bring up
1536.32 -> uh yet another you know increment of
1538.559 -> your blue environment or call it another
1540.48 -> color
1540.96 -> that's going to represent that to be
1542.48 -> deployed new version
1544.159 -> of the infrastructure and and
1545.76 -> provisioning infrastructure can often
1547.279 -> take a lot more time than just making a
1548.799 -> quick code change
1550 -> um and then last but not least the
1552.48 -> you're gonna have to think about what
1553.52 -> cold infrastructure means when it
1554.88 -> receives requests if you're
1556.32 -> dependent upon things like in-memory
1557.919 -> caching or session state within your
1559.679 -> application
1561.12 -> and the green version that hasn't
1562.48 -> received any requests yet doesn't have
1564.08 -> those things populated
1565.36 -> that might impact performance for some
1566.799 -> period of time while those things get
1568 -> populated as requests roll in
1570.559 -> okay so if blue green deployment's the
1572.4 -> right method for you how do you
1573.279 -> implement it
1574 -> very similar type of properties being
1575.6 -> highlighted here so in the ecs front
1577.76 -> um you've got the option of of having a
1581.44 -> built-in blue-green deployment
1584.24 -> capability
1585.6 -> where you're going to define a new set
1587.039 -> of container images for your task and
1588.64 -> when that deployment occurs
1590.559 -> ecs will provision the new
1593.679 -> blue-green environment but linearly
1595.84 -> shift traffic to the green
1597.36 -> set of images um so even though it's
1599.6 -> it's a it's a call out of a linear
1601.039 -> configuration here
1602.4 -> it's really going to provision that new
1603.84 -> set of images because there's no
1605.039 -> in-place deployment with the container
1606.64 -> image right it's always going to be so
1607.84 -> to speak a green
1609.039 -> uh fresh container images that's been
1610.88 -> deployed so i put it in this blue green
1612.32 -> category
1613.12 -> but you're able to kind of linearly
1614.64 -> shift traffic over to that green image
1617.2 -> over time and then on lambda the same
1620.32 -> type of idea
1621.279 -> as before with ecs you're not making
1623.36 -> code changes within
1625.12 -> a lambda function itself too it's always
1626.96 -> a new lambda function
1628.32 -> alias that's serving traffic so that
1630.32 -> same idea of it being
1632 -> really a linear shift of traffic but to
1634.32 -> a green environment
1635.679 -> is is also how you'd implement it in
1637.279 -> lambda with ec2
1639.279 -> it's it's going to be required to
1642.32 -> implement that fully fresh environment
1644.799 -> um to
1645.679 -> to have a new set of ec2 instances
1647.84 -> servers that get deployed as part of
1649.919 -> your application and you're going to
1651.2 -> instrument
1652 -> the blue green deployment mechanic
1653.679 -> through something like dns shift of
1655.36 -> traffic
1656.32 -> um or you know another service discovery
1658.48 -> mechanism that's going to allow traffic
1659.919 -> to be served by those new
1661.52 -> server images because on code deploy
1664.159 -> when you're implementing deployments
1665.36 -> with servers you're going to be making
1666.48 -> changes within those server images
1668 -> themselves you could still use code
1669.76 -> deploy to help you deploy your code to
1671.6 -> that new green environment but if you
1672.96 -> want the blue green experience
1674.48 -> you're going to have to do the traffic
1675.679 -> shifting through dns or another another
1678 -> mechanism
1679.679 -> last option i'm going to highlight is
1681.279 -> canary or one box deployments
1683.279 -> so this is the ability to change the
1685.6 -> smallest unit of infrastructure possible
1687.279 -> within your environment
1688.88 -> be confident that that one unit of
1691.12 -> infrastructure is behaving healthy
1692.72 -> and then flip the entire rest of the
1694.72 -> application to
1696.399 -> the new set of uh to the new version of
1698.88 -> your application
1699.679 -> so you're able to really minimize risk
1701.279 -> focus on that tiny portion of your
1702.72 -> environment
1703.919 -> and it allows you to experiment too you
1705.919 -> could use that tiny environment where
1707.52 -> there's a tiny piece of infrastructure
1708.96 -> running
1709.6 -> to experiment on new features a b
1711.919 -> testing
1713.039 -> or or just reduce deployment risk like
1714.72 -> here um some cons of this or
1717.6 -> some kinds of the the canary approach if
1720 -> you're going to create a
1721.039 -> a one box or a canary environment that
1723.2 -> represents your application
1725.44 -> you're going to have to you know have a
1727.279 -> new environment you're supporting that
1728.88 -> maybe
1729.52 -> that may be serving production traffic
1732.399 -> as part of your application so a
1733.84 -> rollback
1734.96 -> is now going to be multi-stage just like
1736.96 -> a deployment's going to be multi-stage
1738.799 -> you've got a brand new type of
1740.72 -> production environment
1742.32 -> that lives on its own that you're
1743.679 -> deploying to independently
1745.36 -> so you need to be aware of that
1746.48 -> additional staging that's going to come
1748 -> into
1748.399 -> come into play as you're kind of going
1750.88 -> through your deployment steps
1752.48 -> and and that might bring additional
1753.679 -> complexity to that idea that these two
1755.679 -> these two environments that are always
1756.96 -> going to be serving production traffic
1758.64 -> um are going to need to be kept in sync
1760.64 -> um you know our other tooling that's
1762.159 -> aware that there's now an independent
1763.679 -> environment you've got to be aware of
1764.88 -> that
1766.32 -> so how you implement canary or one box
1768.64 -> deployment on
1770.08 -> ec2 is similar as before you're going to
1772.799 -> need to create a new environment
1774.96 -> which is required for ec2 but you always
1776.88 -> have the ability to create that new
1778.32 -> environment with ecs and lambda as well
1780.64 -> to rather than just have a production
1782.08 -> environment create another named
1784.48 -> named environment for your containers or
1786.24 -> lambda function and then i've got
1787.679 -> properties called out here
1789.36 -> to do the canary type of deployment
1790.96 -> where 10 of my container image
1793.039 -> of the new version is going to retrieve
1794.799 -> receive traffic before the other 90
1796.799 -> percent
1797.679 -> of the traffic is shifted to the the new
1800 -> application version
1801.2 -> and similarly on lambda shift 10 percent
1804 -> for
1804.32 -> a course of 10 minutes over to my new
1806 -> alias but after that 10 minutes is up
1808.08 -> the other 90 is going to immediately
1809.76 -> shift over to the to the new version of
1811.84 -> my
1812.159 -> my application alias okay so i've walked
1815.36 -> through the different options available
1816.799 -> to you and kind of some details of how
1818.32 -> to implement them so how do you choose
1819.6 -> between them remember step one
1821.52 -> make sure your metrics tests and alarms
1823.279 -> are in place um the second step
1825.279 -> is to remember embracing automation the
1827.52 -> core of all of these options is that it
1829.2 -> should be automated
1830.799 -> and the third is whichever option you
1832.72 -> choose start small and think of it
1834.48 -> think of your deployment process as
1836 -> another piece of software it's another
1837.84 -> type of application you're supporting
1839.279 -> almost the deployment mechanics for you
1841.36 -> and be iterative you know you don't have
1843.039 -> to solve all of your deployment
1844.24 -> requirements
1845.039 -> um right from the beginning um and and
1847.52 -> you know think about ways in which your
1849.039 -> deployment can mature over time in an
1850.72 -> iterative way just like you build
1852 -> application
1852.88 -> uh changes into into the actual
1854.799 -> application that's being deployed to
1856.799 -> um and then finally think about which
1858.48 -> downsides i've kind of described might
1860 -> be most impactful
1861.679 -> to the way that you operate your culture
1863.2 -> and that might help you choose which
1864.72 -> type of
1865.679 -> of application deployment methodology
1868.64 -> would avoid those downsides the best
1870.88 -> that would impact you and if
1874.08 -> a little teaser here if you want to get
1875.36 -> a sense of how aws does deployments
1877.44 -> ourselves and
1878.08 -> we have a pretty modern approach i would
1879.519 -> say to to deployment and we we embrace
1882 -> a bit of all of those methodologies i've
1883.76 -> described from team to team
1885.36 -> there's a really great blog post
1886.72 -> available in our builders library that
1888.159 -> describes exactly how
1889.76 -> our our safe hands-off deployments are
1892.24 -> automated on top of aws and this is a
1894.08 -> little image
1894.88 -> to define for you how deployment really
1896.96 -> does occur on top of aws within our own
1899.2 -> environment
1899.919 -> and if you want to learn more in detail
1901.44 -> there's actually another session
1902.559 -> available to you called
1904.24 -> in the builder library session um 207
1907.6 -> where claire is going to dive really
1909.279 -> deeply into all of the things that the
1910.88 -> blog post talks about
1912.08 -> and if you want to learn more about how
1913.279 -> aws deploys i really recommend you check
1915.2 -> out you check out our session as well
1918.08 -> and again i thank you very much for
1919.6 -> joining devops through 303
1921.519 -> um hopefully you've learned a little bit
1922.799 -> about the deployment options available
1924.08 -> to you my name is andrew baird again
1926 -> um and hope you're enjoying re-event
1927.6 -> have a have a great uh conference
1930.76 -> thanks

Source: https://www.youtube.com/watch?v=-55YIDf0Z-E