AWS re:Invent 2022 - Building connected vehicle and mobility platforms with AWS (IOT311)

AWS re:Invent 2022 - Building connected vehicle and mobility platforms with AWS (IOT311)


AWS re:Invent 2022 - Building connected vehicle and mobility platforms with AWS (IOT311)

By 2030, it’s projected that 100 percent of new vehicles sold will ship with connectivity platforms. In this session, learn about the services and solutions that AWS provides to help OEMs, Tier 1 suppliers, fleet telematics solution providers, and automotive ISVs build and deploy systems that securely connect vehicle fleets to the cloud. Learn more about AWS’s vision, newest capabilities, and best practices for connected vehicle platforms.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

0.21 -> - So my name is Katja,
1.86 -> I'm IoT Specialist Solutions Architect.
4.17 -> have been with the company since 2019, but IoT's so cool.
8.58 -> So I decided to switch over to IoT beginning of this year.
12.75 -> I'm joined by two amazing people today,
16.41 -> first is Mike,
17.31 -> who is General Manager in the automotive space at AWS.
22.2 -> And very exciting as well,
23.67 -> we have a company that you might have heard of,
25.74 -> so Mercedes is joining us with the Chief Architect, Kevin,
29.88 -> to tell you about the exciting story
32.07 -> to connect vehicles using AWS.
36.57 -> Before we dive into the topic
39.277 -> "How to build connected vehicle platforms on AWS,"
44.79 -> I wanna take a step back and talk to you about scale.
49.02 -> So in 2020, a level 1 vehicle model
54.3 -> might produce up to one gigabytes of data per hour.
58.98 -> We're going a step further, 2022,
62.25 -> we're talking about already level 2 plus vehicle models,
65.76 -> which generate more than one terabytes of data per hour.
70.8 -> Looking at the whole market
72.33 -> of a hundred million vehicles per year,
75.18 -> that might be driving around four hours per day,
78.3 -> 300 days a year,
80.1 -> we're going up to 120 zetabytes of data.
85.41 -> That's massive scale,
87.66 -> but we don't end in 2022,
89.7 -> so let's go further.
92.943 -> And in 2030, a level 3 plus vehicle model
97.14 -> will produce more than 10 terabytes of data
101.64 -> in a single hour.
103.71 -> Multiplying the 120 zetabytes we had before with 10,
108.3 -> we end up with 1.2 yottabytes of data in total.
115.41 -> So there is no compression algorithm for experience.
119.94 -> So AWS, since day one,
123.06 -> has built scalable platforms and services,
126.6 -> and we wanna talk to you today
129.3 -> about which of these you can actually leverage
132.51 -> to focus on your benefits and leave the heavy lifting to us.
140.07 -> Before we can actually dive into how and what to build,
144.06 -> let's start with the challenges and opportunities.
147.66 -> So first of all,
148.65 -> instrumenting the physical world takes time.
152.64 -> So there is a need for organizations
155.04 -> to automate vehicle onboarding and testing.
159.6 -> Second would be that future vehicle models,
162.21 -> as we just heard,
163.95 -> will generate multiple terabytes of data per hour.
169.08 -> So we need better tools to discover
172.2 -> and collect high-volume data.
176.07 -> Last but not least,
177.78 -> with massive scale comes more responsibility.
181.86 -> So let's earn customer trust with robustness,
185.58 -> which would be high availability,
188.46 -> security, as well as privacy.
194.13 -> Let's look at how this architecture
197.01 -> for connected vehicle platform might look like.
200.19 -> We start at the very bottom with a device layer,
203.19 -> which is about functions for hardware abstraction,
207.21 -> the operating system,
208.74 -> as well as device onboarding and management,
211.62 -> its very foundation.
213.57 -> On top, we have the connectivity layer,
216.81 -> which concerns itself with the connection,
219.33 -> the secure connection between vehicle and cloud
223.29 -> using protocols like MQTT or HTTPS.
228.24 -> Between vehicle and cloud, we have two abstraction layers,
231.66 -> which are software and data abstraction,
234.57 -> which should provide a unified view
237.09 -> of software and data afterwards.
240.33 -> In the cloud we have the operations there,
243.27 -> which is about monitoring, performing customer support,
246.87 -> as well as managing deployments.
250.08 -> And then last but not least,
251.91 -> the one layer that differentiates your product
254.55 -> and your business is the applications layer.
259.35 -> And we at AWS recommend customers
261.78 -> to leave the undifferentiated heavy lifting to AWS.
265.74 -> So everything from the vice layer up to operations layer
269.91 -> is something that we can help you with
272.73 -> to focus on the level
275.61 -> or give you the opportunity to focus on the level
278.04 -> that differentiates your business.
281.49 -> Because of the time we have today,
283.05 -> we are going to pick a few of this area and dive deep
287.31 -> and show you how that can be done.
289.89 -> So we're talking about device layer, connectivity layer,
293.34 -> as well as data obstruction.
298.05 -> In a little more detail,
299.7 -> that means that I'm going to talk today or start
303.45 -> with connected vehicle device management.
306.48 -> So we are leveraging AWS cloud services
310.38 -> to issue and register certificates
313.68 -> for secure communication from vehicle over
317.1 -> or from their vehicle gateway over to AWS IoT core.
322.38 -> Second is connected vehicle device defender.
325.59 -> So we are leveraging a service
327.27 -> called AWS IoT Device Defender
330.36 -> to detect revoked certificates
333.33 -> as well as certificates that are about to expire
336.48 -> and perform rotation of those certificates
341.43 -> in an automated way using AWS IoT managed services.
346.95 -> Last but not least,
348.36 -> we launched MQTT 5 for AWS IoT Core just a few days ago,
354.36 -> and this is so exciting news that we want to share today
358.05 -> and dive into accelerating and optimizing
361.2 -> the communication between vehicle and cloud
363.78 -> with you in this session today.
369.99 -> So as I said,
370.823 -> we're starting with securing ECU communication.
374.37 -> This is the very foundation, we saw that before.
377.79 -> We have the device layer,
379.5 -> builds the foundation for everything
382.08 -> when it comes to the connected vehicle platform,
385.83 -> as well as us considering that we have a market
389.58 -> of a hundred million vehicles per year,
392.82 -> so this is something that actually has to happen at scale.
397.32 -> We're going to look at the architecture for this now.
400.5 -> So you see, on the left, we have the vehicle itself
405.12 -> with a vehicle gateway
406.8 -> which manages the communication from vehicle to cloud.
411.78 -> We have a attestation,
413.01 -> so a long lift certificate on the vehicle itself already,
417.33 -> and are using SDK, so AWS IoT SDK or a third-party SDK,
423.81 -> for the communication over to cloud.
428.76 -> On premises, we have a root, like a trust anchor,
432.75 -> that we wanna leverage
434.79 -> for issuing a subordinate CA within the cloud
438.84 -> using the service AWS Private CA.
442.98 -> And we have AWS IoT Core,
445.65 -> where we need to import the certificate
448.95 -> of the subordinate CA for the later flow
452.91 -> for issuing and registering the certificate.
456.6 -> This one is also going to include
459.66 -> the just in time registration flow
462.3 -> we have in AWS IoT core,
465.15 -> which is, when we talk to OEMs,
468.93 -> the flow we usually lend up with
471.66 -> when it comes to a best practice for device provisioning.
480.48 -> So the next step we're going to go is, first of all,
482.64 -> we want to have the operational certificate we use
485.49 -> for the mutual TLS communication on the vehicle.
488.85 -> Two options, we might do or have the option
492.27 -> to send over a CRS,
494.55 -> so use client-generated private key
499.02 -> to issue the CRS,
500.31 -> send it over to an interface we have on premises,
503.19 -> so a certificate broker,
505.05 -> which then communicates with the subordinate CA
508.05 -> to issue the certificate and get it back
511.26 -> over a secure channel you have to have established before
514.95 -> to the vehicle itself.
517.29 -> A more batch oriented approach
519.06 -> would be that serve a generated private key,
522.09 -> so you can issue multiple certificates
524.07 -> using the same interface with AWS Private CA
527.52 -> and communicate those back to vehicle
530.13 -> also over a channel that is secured.
533.67 -> When we store the operational certificate,
536.07 -> the vehicle gateway should check for the authenticity
539.31 -> as well as the integrity of the certificate before.
544.2 -> We're then going to use the certificate
546.12 -> to connect over to AWS IoT Core
549.84 -> and kick off this just in time flow I mentioned before,
553.77 -> the best practice to register the certificate,
558.21 -> because it has been signed by subordinate CA before.
562.11 -> And then trigger lambda function,
564.27 -> which then does custom testing
566.91 -> if you want to check with the database or something else
569.67 -> to make sure everything is valid
571.71 -> and then create the resources of an IoT thing,
576.78 -> an IoT policy,
578.13 -> which is a fine granular policy for your vehicle,
581.28 -> what it actually can do in AWS IoT,
584.64 -> and activate the certificate.
590.31 -> Once we have a certificate, we are not done,
593.43 -> because in the automotive space,
596.97 -> it seldomly is the case that the automotive car owner
601.71 -> actually owns the vehicle from manufacturing time
606.12 -> till end of life.
608.01 -> Often, people own cars for about six years.
611.52 -> So we actually need the opportunity to rotate certificates
617.25 -> also for the end of life, of course,
620.43 -> if we wanna revoke certificates after that.
623.91 -> So that's what we're going to look at.
626.91 -> We're going to use AWS IoT Device Defender,
631.47 -> which gives us the option to use automated audit checks,
636 -> which look, first of all,
637.95 -> one of those checks could be looking for revoked certificate
642.69 -> or certificates that are about to expire
646.11 -> within the next 30 days or have expired.
651.33 -> Also with this AWS IoT Device Defender,
654.72 -> we offer mitigation actions out of the box.
658.68 -> First of those would be deactivating the certificate,
662.7 -> which would make sense if the CRL
665.61 -> actually had a positive result
667.71 -> for the certificate being on there.
671.04 -> And second would be publishing the finding to SNS,
675.21 -> so our notification service,
677.49 -> invoking lambda function,
679.35 -> which then uses AWS IoT jobs,
682.89 -> which make it easy for you
685.14 -> to create and register a new certificate
689.07 -> and then deactivating the old one.
693.36 -> Also, there might be the case
696.18 -> that your operational certificate
698.64 -> was valid for a certain period of time,
701.1 -> but your vehicle actually was switched off longer,
704.49 -> so you don't have a valid operational certificate anymore.
708.57 -> In this case,
709.403 -> you still have the attestation certificate here
712.5 -> that you can leverage
713.43 -> to go back to the first floor we saw before
716.7 -> to restart the communication with the certificate broker
720.09 -> and get a valid operational certificate again,
723.18 -> just as a plan B.
727.89 -> As I said, we launched MQTT 5.
731.04 -> I think we have all been waiting for this.
732.93 -> So this is really, really exciting news.
735.78 -> And while it's suited for many markets,
738.3 -> the automotive industry has embraced MQTT 5
743.61 -> as its new standard for connected vehicle platforms.
748.2 -> The ease of implementation, enhanced security,
752.13 -> as well as the ability to deliver large amounts of data
756.72 -> with a simple published and subscribe method
759.99 -> have enabled MQTT 5 to support order technologies
764.85 -> like HTTP or SMS.
769.53 -> We're going to look at three of the features.
772.29 -> So we're selective here again,
774.15 -> three of the features because of time,
776.16 -> we're looking at shared subscriptions,
779.31 -> we're looking at request response,
781.11 -> as well as the header fields
783.24 -> that add benefit to your connected vehicle platform.
791.1 -> First of all, for shared subscriptions,
795.99 -> so that's what we're going to start with,
798.42 -> imagine, you might have one,
800.46 -> but if you don't,
801.57 -> imagine you have a huge fleet of vehicles connected,
807.27 -> and those vehicle fleet
809.46 -> or that vehicle fleet with a lot of vehicles,
813.06 -> they publish similar messages requesting configuration
818.46 -> or publishing status updates.
822.9 -> Now, before we launched MQTT 5,
825.24 -> you could publish all of these messages to one topic
829.47 -> and have one subscriber
831.3 -> that actually listens in and works on those messages.
836.25 -> But we talked about scale before,
838.41 -> so now, you can actually publish those
841.44 -> to a shared subscription topic
845.28 -> which has multiple subscribers
848.28 -> to listening to the incoming messages.
853.08 -> This means we are going to load balance
855.24 -> the messages for you.
856.44 -> So only one of the receivers,
858.21 -> well, actually our subscribers,
859.65 -> will actually receive the message,
862.62 -> which also means that in case something is wrong
866.1 -> with one of your backend services,
868.29 -> you can actually take that one offline.
870.48 -> You still have multiple alternatives
873.6 -> as receivers for the message,
875.64 -> and your vehicle isn't influenced by a failing backend.
882.69 -> So that's shared subscriptions.
886.92 -> Second, we wanna look at request response.
892.05 -> So if we look at this request response pattern,
896.52 -> today, there's no simple way
899.34 -> of getting positive acknowledgements back from the vehicle
903.03 -> that you told that something should happened
905.64 -> and it actually has happened.
908.13 -> So it received a command and it executed it.
912.69 -> Now we have, with MQTT 5,
915.21 -> the option to specify a response topic.
917.55 -> So you send from, in this case, a lambda function,
920.22 -> the message to a topic
921.3 -> and have the vehicle receive the message,
924.06 -> and then on completion of the command that it receives,
927.84 -> it goes back to the response topic with a success message.
932.88 -> Just a note here, in case you want to have this combined
936.6 -> or want to have something for your commands
938.97 -> that's more stateful,
940.47 -> look at the service called AWS IoT Device Shadows.
947.52 -> Last topic I wanna cover with you today
950.25 -> is optimizing message routing.
952.89 -> So the rules engine and IoT Out Of The Box supports JSON,
957.75 -> which allows you to make full use of the rules engine
961.92 -> and specify when you wanna route a message
965.46 -> to which destination.
968.13 -> In case we have a message like this
970.11 -> which is protobuf encoded,
972.69 -> we actually can't make full use of the rules engine
976.14 -> because protobuf is a natively supported.
978.84 -> Same problem if you send a compressed message
981.39 -> where you can't inspect the message in the rules engine.
985.32 -> Now with the header fields,
987.48 -> you can actually determine
989.07 -> in the information you get additionally to the payload
993.03 -> where you want to store a message.
995.76 -> We're looking at two flows here for a short example.
1000.35 -> First of all, you store a message in a data lake,
1004.94 -> and second, you also have a protobuf decoder,
1008.33 -> which would receive a message if it's protobuf.
1011.36 -> And then we republished to a topic
1013.13 -> and also store it in a data lake.
1017.48 -> I'm giving you a example query here
1020.48 -> to show you the benefit now
1022.85 -> is that you can take from the header,
1025.91 -> content type in this case as well as format indicator,
1030.08 -> which allow you to see
1031.46 -> that the message actually is protobuf,
1034.4 -> and it's the end of sending it to the same destination,
1037.43 -> to the same backend,
1038.54 -> which then has to figure out where to put the message to.
1042.92 -> You can use this very powerful rules engine to send it now
1050.073 -> in this role to the protobuf decoder.
1055.64 -> This allows you to take weight off your backend,
1059.24 -> make full use of the rules engine,
1061.52 -> and optimize your routing
1063.11 -> because the message goes directly to the destination
1067.76 -> it's spilled for.
1071.75 -> So now we covered the device layer
1075.56 -> as well as the connectivity layer,
1078.17 -> and I'm handing over to the data geek, Mike,
1081.86 -> who's going to talk to you about the data abstraction layer.
1085.43 -> Thank you, everybody.
1087.802 -> (audience applauding) - Thank you, Katja.
1092.87 -> Hi, everyone.
1093.8 -> Happy to be here and geek out about data.
1099.11 -> About a year ago,
1101.57 -> we started thinking very seriously
1104.15 -> about some of the biggest challenges with data.
1107.87 -> And as Katja already mentioned before,
1110.6 -> we have a few of large, large problems ahead of us,
1115.55 -> especially in the automotive and transportation industries.
1119.81 -> We have problems around data fragmentation,
1123.08 -> AKA data that come in in different formats, types,
1127.88 -> proprietary and codings of sorts, and so on.
1131.84 -> We have a tsunami of data coming up,
1135.08 -> yottabytes, as Katja estimated, in 2030.
1139.22 -> And we have still ways to go
1142.22 -> before we can solve data delays,
1145.85 -> AKA how do you synchronize a number of different signals
1150.95 -> in order to get to the right answer.
1154.46 -> So we started spinning our head around,
1157.227 -> "What's going on, how can we solve this problem?"
1160.7 -> And the very first area that we found room for improvement
1166.91 -> was the transitional architectures that we see
1170.6 -> in the broader automotive industry today with ETL.
1176 -> Pretty much, I'm sure, every one of you here
1178.88 -> has used in one way or another
1180.86 -> some form of data extraction,
1184.34 -> data transformation, data loading, like architecture.
1189.35 -> And this has served us well for many, many years
1193.7 -> and is starting to show its age.
1198.23 -> So we took a different stance,
1201.29 -> a different view, if you want,
1203.18 -> on what the future architectures
1206.42 -> around all this complex data that we are dealing about
1210.92 -> in the connected vehicle space should be served with.
1215.84 -> So we thought that instead of spending a little bit of time
1220.1 -> in that extraction,
1222.11 -> some time in transformation,
1223.85 -> and a ton of time in data loading and processing,
1227.9 -> what if we were to do some more pre-work in data modeling,
1235.28 -> then focus in data selection.
1238.61 -> And the last part,
1240.29 -> the data analytics, insights, and viewing
1243.35 -> then becomes something
1245.3 -> that is as simple as pressing a button.
1249.71 -> So we thought hard
1251.54 -> about how we're gonna make this transition,
1254.24 -> and we launched the service in preview at re:Invent
1258.14 -> about a year ago called AWS IoT FleetWise,
1262.64 -> that is a data abstraction layer
1266.24 -> to collect, transform, and transfer data
1270.62 -> from the vehicle to the cloud.
1272.99 -> So as Katja said,
1274.37 -> we are already using IoT device management,
1278.54 -> IoT Core, MQTT 5, it's wonderful
1281.69 -> and will empower a lot of scaled app use cases.
1287.12 -> And we build on top of that
1289.49 -> this model select view architecture.
1294.47 -> So what we do with the modeling side
1297.56 -> is that we create a set of APIs,
1301.34 -> we call them the IoT FleetWise Designer APIs
1305.93 -> that, in instance, create an ontology,
1310.46 -> a semantic ontology of what is a car or a fleet of cars,
1316.79 -> and what's like to be working with thousands
1319.49 -> or tens of thousands of signals that are slightly different,
1324.32 -> every so often between every car model out there.
1330.05 -> So that's the modeling part.
1331.91 -> The second part, FleetWise service,
1334.88 -> is around data connection,
1337.16 -> and to connect, of course, we need an agent,
1340.25 -> and this is something that is open source, is on GitHub,
1344.24 -> and we work with our partners and our customers
1347.54 -> in order to fine tune that open source code
1352.01 -> to their end point,
1354.59 -> whether it's a tiny dongle, an automotive,
1358.04 -> TCU, telecommunications unit,
1360.47 -> or an ADAS supercomputer sitting on the level 3 vehicle.
1365.93 -> So that's part two.
1367.64 -> Part three is when we select the data,
1370.91 -> and selecting the data
1372.77 -> is where we create certain data collection campaigns
1379.4 -> that describe what is of interest.
1383.57 -> As I said before,
1384.8 -> we can't keep on having an all you can eat mindset,
1389.06 -> it just doesn't scale.
1390.92 -> And already many of you, I'm sure, are feeling the pains
1394.88 -> of what I call the big, bad data.
1399.68 -> So that's your friend here.
1400.97 -> The data selection APIs that we provide there.
1404.66 -> And now that you have done this pre-work, data modeling,
1408.65 -> data connection, and data selection,
1411.71 -> the rest of the pieces,
1412.94 -> data collection and data processing,
1415.55 -> storage, visualization, et cetera
1418.34 -> becomes something much, much easier than before.
1422.9 -> So let's dive in a little bit more
1426.11 -> on what do we mean when it comes down to data modeling.
1431.21 -> So we started with standardizing and using something
1436.46 -> as commonly accepted as,
1440.12 -> you can think of it, a generalized format of sorts.
1443.72 -> And we pick something that happens to be used
1446.06 -> by many automakers already today,
1449.12 -> from our friends at the COVESA consortium,
1453.11 -> COVESA has proposed a format
1455.9 -> called the vehicle signal specification, VSS,
1460.76 -> in order to create this representation of the vehicle.
1467.21 -> So when we talk about the vehicle,
1469.91 -> at the end of the day, we talk about a sum of parts.
1474.92 -> So a vehicle at the top level is a model
1478.88 -> of something can have body, a carbon, a chassis,
1484.4 -> general attributes like the VIN number,
1488.57 -> a powertrain, ADAS systems, and even the driver.
1493.91 -> So at the very top level of the tree,
1496.43 -> think of this as a tree,
1498.65 -> a vehicle has certain subsystems.
1503.42 -> Now, inside each and every one of these subsystems,
1506.9 -> there are further branches, attributes, AKA metadata,
1513.86 -> and sensors and signals, AKA the real data.
1519.8 -> So the VSS format now gives you this ability
1525.08 -> to start zooming down inside your vehicle.
1528.83 -> And in my example there,
1531.02 -> I extracted what is inside the powertrain subsystem.
1537.05 -> So in the powertrain subsystem,
1539.42 -> you will find in the VSS format today
1542.75 -> also available on GitHub, right?
1545.96 -> Also available for anyone of you to use, to extend,
1550.67 -> and to contribute back if you wish to.
1554.51 -> So inside the powertrain today,
1556.4 -> there are a few dozen of attributes.
1559.1 -> I picked a few of them just to give you this example.
1562.94 -> So inside powertrain, you can have branches around,
1566.457 -> "Is this a combustion engine, is this a transmission,
1569.607 -> "is this an electric motor, is this a battery?"
1573.83 -> And you can go even further inside this tree,
1577.34 -> inside battery,
1578.72 -> you can have a number of different attributes and branches,
1582.32 -> like ID, capacity, charging, et cetera.
1587.06 -> And now inside even that layer,
1589.34 -> you can have all sorts of different measurements
1593.15 -> like, is the battery's charging, discharging,
1596.42 -> or any other one of the values that you see there.
1600.5 -> At the end of the day,
1603.47 -> a signal example will be a fully qualified name
1608.45 -> that will read something
1609.74 -> like Vehicle.Powertrain.Battery.Charging.ChargeVoltage
1618.288 -> as one of the many, of course, examples there.
1621.62 -> Now we have some sort of agreement,
1624.56 -> some sort of baseline of where to begin with
1628.43 -> when we talk about vehicle data modeling.
1632.84 -> Let's see then what we can do
1635.6 -> with that representation schema.
1639.62 -> For starters, our business as automakers
1643.52 -> or fleet managers of sorts
1645.77 -> is that we have a number of different vehicle models.
1651.62 -> We don't just make one car and one time only,
1654.77 -> we make multiple.
1657.05 -> So imagine that you have two vehicle models,
1660.2 -> as I have in the example over there,
1662.57 -> Model M1, Model M2.
1666.05 -> And for those of you that have worked
1668.78 -> in anything behind a vehicle,
1671.72 -> probably you know
1672.56 -> that you will have your own proprietary signals
1676.25 -> for that vehicle,
1678.08 -> also known as CAN DBC or VSPEC
1683.39 -> or many, many other proprietary formats.
1688.64 -> So what you have now is that internally,
1692.69 -> you have a number of different private signals
1696.53 -> that represent each and every of the systems
1701.72 -> and each and every of the vehicle models that you produce.
1708.02 -> Again, two layers, right?
1709.4 -> There is the systems that can be different,
1711.5 -> and then there is the complete vehicle models
1714.26 -> that are different.
1715.61 -> And again, all of those can be all over the place.
1719.09 -> You can have one model, you can have two,
1721.61 -> you can have a hundred vehicle models.
1725.72 -> So now what FleetWise does behind the scenes for you
1729.8 -> when you use the API
1731.57 -> is that it takes all of these inputs that you have,
1736.58 -> let's say you have 10, 20 CAN database files,
1740.39 -> you have a hundred Excel files
1743.24 -> that have a key value pair somewhere.
1746.33 -> So we take this as inputs
1748.88 -> and then we map them behind the scenes
1752.42 -> to the VSS global signal catalog.
1758.36 -> Does this remind you something?
1762.29 -> It's something very similar
1764.48 -> to private public key cryptography.
1768.8 -> Think of your CAN DBC,
1771.26 -> think of the things that you know about your vehicles
1774.68 -> as private information you only know
1777.95 -> and you get to keep in your account.
1781.85 -> And then VSS becomes your public key
1785.66 -> that you want others to know
1788.51 -> in order to be able to communicate with you.
1792.62 -> For us, when we designed the APIs,
1795.77 -> we were thinking what will make your life easier
1800.03 -> in order to merge and communicate at the end of the day?
1805.4 -> Data flows across your different vehicle models and types.
1812.18 -> So this is the superpower that, in some ways,
1816.08 -> this part of the vehicle modeling API can provide you.
1822.5 -> So now that you have these multiple vehicle models
1827.24 -> that have magically been mapped
1830.36 -> into a standardized language
1833.12 -> that can cut across all your vehicle models,
1837.17 -> so that when you're asking what's wrong with a tire,
1840.98 -> you don't need to worry
1842 -> about what is the exact name of the signal
1844.73 -> in one vehicle model versus the other,
1847.43 -> or what count is it at or what metric it's using
1851.06 -> or what range it's using
1852.32 -> or what supplier has created this tire
1855.92 -> for one car versus the other.
1857.72 -> All of these are mapped eventually
1860.69 -> to the same entry in the VSS 3.
1865.61 -> So that's quite powerful
1867.32 -> for those that have worked, again, with data,
1869.66 -> you understand that now I'm describing in essence a schema,
1873.86 -> an index there of sorts that makes life easier
1877.7 -> while pre-processing the data.
1881.06 -> So now I have this multiple vehicle models.
1883.82 -> What I can do next is translate this multiple vehicle models
1890.81 -> into decoders of data.
1893.93 -> And here is how you can think of a vehicle decorder,
1897.35 -> a vehicle model decorder.
1900.925 -> It's an API that is translating
1905.87 -> each and every one of your in-vehicle proprietary signals
1912.77 -> into the model that you have created
1917.6 -> from your global VSS 3,
1921.11 -> think of the global VSS 3
1923.6 -> as the dictionary of all languages in the world.
1928.64 -> And a vehicle decoder is that,
1931.137 -> "I'm gonna only take English now
1933.837 -> "and have this information being available in the vehicle
1939.657 -> "to decode the proprietary signals."
1944.36 -> So the decoders are, in essence,
1946.67 -> very small configuration files that you can create,
1952.7 -> and you can create as many as you want,
1956.45 -> even in the same vehicle.
1959.27 -> The same vehicle can have multiple decoders
1963.02 -> for different purposes.
1965.21 -> You could be having a vehicle, by the way,
1967.463 -> that has different suppliers for the same system.
1972.08 -> As in, in my example,
1974.21 -> a vehicle can have a battery supplier A
1977.45 -> and a battery supplier B.
1980.57 -> What is even more challenging in data,
1982.91 -> you could be having different branches of the VSS 3
1988.46 -> that you have chosen
1991.07 -> to give you the same piece of information.
1994.76 -> In my example,
1996.11 -> the case here will be that
1997.76 -> you can ask the powertrain ECU for,
2001.607 -> "What is the status of my battery at any given second?"
2007.15 -> But you can also ask
2009.01 -> the onboard diagnostics component of your car
2012.31 -> to do the same.
2014.32 -> So in some ways, what this decoder capability does for you
2019.36 -> is that it helps you make the best use
2024.22 -> of the actual intention of your campaign
2028 -> in order to collect your data.
2031.63 -> Vehicle decoders can be created
2035.38 -> and can be updated on demand,
2037.84 -> near real time, over MQTT.
2042.13 -> So now, we have done the format,
2044.8 -> we have created the models,
2046.99 -> we have created the decoders.
2049.36 -> You can think of them again as they run time translator
2053.68 -> between the public and the private keys
2056.83 -> that you can deploy all the time.
2059.11 -> The rest is literally the easy part.
2062.29 -> So the data collection now is, in essence, a set of APIs,
2068.02 -> where you create a campaign,
2071.08 -> you can think of it as a set of programmable rules,
2075.82 -> and deploy this campaign to a fleet of vehicles.
2081.52 -> You can select one or you can select a million vehicles
2086.32 -> to deploy a data collection campaign.
2089.71 -> What does it look like?
2091.12 -> A campaign is, in essence, as you can see over there,
2094.99 -> a small snippet that defines the signals of interest
2101.38 -> and the rules behind their collection,
2105.55 -> things that can be time-based, event-based,
2108.94 -> or complex logic based.
2112.75 -> I like to think of our data collection campaigns,
2117.49 -> in essence, as small,
2120.94 -> maybe Kubernetes orchestration instructions,
2126.46 -> where instead of actually looking
2128.8 -> at how am I gonna be deploying the software,
2132.31 -> you look at what are the data
2135.67 -> that I want to have in an enclave
2139.18 -> around the end-to-end use case of interest.
2144.4 -> So the data collection campaign
2146.44 -> enables end-to-end data management.
2151.81 -> You don't have anymore to be thinking,
2154.547 -> "Where is the CSV file that I can load
2158.447 -> "and where is the header
2160.277 -> "so that I can understand what the heck is going on."
2163.63 -> With data collection campaigns, you have an agile,
2168.04 -> and in case I forgot to mention it,
2170.35 -> you can have thousands of these campaigns
2172.51 -> running at any given time in a single car,
2175.93 -> in a fleet of cars,
2177.49 -> or across all of your cars.
2180.43 -> So data collection campaigns become now
2183.04 -> your auto scaling Swiss knife,
2187.69 -> where you can say,
2188.867 -> "I want just this for now or I want all of that later."
2194.86 -> So data collection campaigns do enable you
2198.43 -> to create these results all the way from the inception.
2205.15 -> So you wake up in the morning,
2206.89 -> you think about, "What is my problem?
2209.057 -> "I wanna figure out whether there is a cell in batteries
2213.287 -> "that may be having higher possibility
2215.537 -> "of going out of range, of operational range,
2218.087 -> "within the next three months, within the next nine months?
2221.717 -> "How am I gonna solve that?"
2223.72 -> You create a campaign,
2225.37 -> you create a programmable event campaign,
2228.82 -> where you can say,
2229.757 -> "For this hundred cars over here randomly sampled,
2233.057 -> "give me smaller frequency data,
2237.047 -> "but for these five cars over there,
2239.537 -> "I want to know everything about that."
2242.14 -> And then you can do AB testing,
2244.57 -> you can do anything you wish
2246.7 -> with the data that are coming in,
2248.83 -> because they are standardized,
2251.68 -> they are organized already for you.
2255.52 -> In reality, they look something like,
2260.11 -> next slide, like this, on Amazon Timestream,
2263.47 -> you literally have the attributes
2266.89 -> that you actually prescribed
2269.59 -> in your data collection campaign
2272.08 -> coming in all the way to a time-based indexer,
2277.75 -> which is the Amazon Timestream database.
2280.39 -> And in there, you can fully prescribe,
2284.32 -> as I said, what is the schema
2287.53 -> that connects all the way back
2290.32 -> to the actual VSS fully qualified name that we started from.
2297.31 -> So finally, all this work that we did at the beginning
2300.85 -> is paying off,
2302.29 -> because now we have standardized names,
2306.04 -> programmable, accessible, fully-qualified entities
2311.17 -> that you can run your analytics,
2314.44 -> and what you can do with that, it's completely up to you.
2317.98 -> One example is Grafana,
2320.41 -> the Amazon manage Grafana tool,
2322.72 -> where with no code,
2324.61 -> there is literally zero code here
2326.98 -> other than connecting Grafana to Timestream
2330.31 -> in order for you to be able to see dashboards,
2333.52 -> insides, and all kinds of other widgets there in.
2338.17 -> So we hope our customers
2340.27 -> will use some of our data abstraction capability
2344.47 -> for a number of cases.
2346.21 -> In my case, in my example around batteries,
2348.97 -> this can be false warnings, performance degradation,
2352.6 -> anomaly detection, and so on and so forth.
2356.59 -> I hope I give you a little bit of a teaser
2359.17 -> around what the data world looks like,
2363.52 -> but I do wanna save the best for last.
2366.88 -> I wanna invite Kevin for Mercedes to tell you more
2370.69 -> about some real life lessons learned
2373.72 -> from our customer using AWS.
2376.45 -> Thank you.
2377.746 -> (audience applauding)
2385.69 -> - Thanks, Mike, and good morning.
2388.78 -> So I'm Kevin, and I'm here to tell you about a project
2391.3 -> where we migrated our fleet of cars in America's,
2395.17 -> and we, at least, will talk about the attempt in Europe.
2401.76 -> And that's the story I wanna tell you
2403.39 -> of the Europe rollout.
2406.63 -> That is the second thing that we did.
2407.86 -> So I have a project that I'm gonna talk about,
2409.66 -> but I really want to tell you of a this story,
2411.7 -> and it begins with our heroes on the cloud foundation team.
2415.75 -> They're in a Teams call in late summer,
2419.68 -> and they're talking with our colleagues
2423.73 -> about why we're making this change
2425.95 -> and about the rollout that we hope to do
2428.56 -> in late October, 2022.
2431.68 -> So we have completed the rollout in the Americas,
2435.22 -> and we've gotten the benefits that we're looking for.
2438.43 -> I trust everyone in the audience's literacy,
2440.32 -> so I'm not gonna read this slide to you,
2442.21 -> but one of our colleagues sums it up
2444.34 -> as seven figures of cost reduction and hope for the future.
2449.89 -> That's what we're looking for here. (chuckling)
2451.72 -> We've tried explicitly
2453.31 -> to keep the scope of the current changes very, very small.
2456.43 -> We say a lot of,
2457.277 -> "We're gonna do surgery and then we're gonna do more."
2460.63 -> So when we talk about what are we actually changing,
2464.35 -> this is the scope of changes to the system.
2467.41 -> You can see on the left there, the TCU,
2469.03 -> that's our telematics control unit.
2470.68 -> That's the component in the car
2472 -> that kind of functions as a modem.
2475.18 -> And that's the thing that communicates over MQTT,
2477.25 -> and actually, that's out of scope.
2479.32 -> The cars are not changing,
2480.58 -> that was part of the directives for our project.
2482.92 -> We wanna make a very small change
2484.27 -> to a very large fleet of vehicles.
2487.72 -> What we call our vehicle anchor point in the previous system
2491.38 -> is what we're moving to IoT Core.
2493.99 -> So everything that the cars communicate to
2496.09 -> will communicate directly to AWS on IoT Core.
2499.93 -> And then we have some components that we use
2501.55 -> to communicate to the rest of the organization.
2505.54 -> You can think of it as the vehicles coming from the car
2508 -> go through the telematic protocol gateway,
2510.01 -> they're in the middle.
2511.69 -> Outbound messages go through the one
2513.28 -> that's named more expectedly
2514.89 -> in the outbound communication gateway.
2516.58 -> And then we have a connection monitor
2517.72 -> to tell the systems in our intelligent cloud
2520.69 -> whether you should try
2522.13 -> or whether you should use some sort of backup fashion.
2525.1 -> So our heroes are in the meeting,
2528.28 -> and they ask, "How is the EMEA rollout
2532.967 -> "going to go smoother than AMAP went?"
2536.32 -> And without getting to too many of the details,
2538.63 -> the AMAP rollout was a little rough,
2542.77 -> and we had some rollbacks,
2545.2 -> we had some issues that we had to fix.
2547.6 -> And some of those are in the details of implementation.
2549.94 -> So looking here, kind of digging into that box,
2554.47 -> from IoT Core, when we get messages on topics,
2557.26 -> we send those through our lambdas.
2559.12 -> This is primarily a stateless system.
2561.49 -> So you might be surprised to find databases
2563.56 -> on the diagram in the stateless system.
2565.54 -> We talk through
2566.38 -> that one of the things that the system needs to do
2568.93 -> is understand where to send a connection from a vehicle.
2572.38 -> There are internal systems for that.
2573.76 -> So we use DynamoDB and a DAX Cash
2577.18 -> that has its own messages for caching.
2579.58 -> Every time a vehicle comes in,
2580.93 -> we need to do a lookup.
2581.95 -> We can't do that lookup dynamically,
2583.72 -> and by the way, this right side of the lambda,
2586.48 -> that's a cloud boundary,
2588.16 -> there's another cloud provider who will not be named,
2591.04 -> is where we're communicating to on the other side of that.
2593.38 -> So we needed to do very efficient caching
2595.57 -> so that we don't add another back and forth on that
2597.82 -> for latency.
2599.65 -> To make this more concrete,
2600.79 -> what are we talking about in these messages?
2602.35 -> There's telematics data for sure,
2604.12 -> but the most concrete example that I can give you
2606.22 -> for thinking about what are these cars doing
2608.35 -> is remote starts and door lock and unlock.
2610.99 -> So when we're talking about what's our disruption
2612.88 -> to our customers,
2613.713 -> because we're driven by customers who drive,
2615.88 -> nobody says that but me, but I like it.
2618.79 -> That's what we're thinking about. (chuckling)
2620.62 -> So people can't use the expectations,
2625.27 -> they can't have that experience
2626.26 -> that we're trying to give them.
2627.61 -> So when we're talking about these disruptions,
2631.78 -> that's what the stakes are.
2634.39 -> So then the conversation for our heroes
2635.83 -> turns into what are you tracking?
2638.35 -> And we bring up our single pane of glass mostly as a show,
2643.27 -> because our single pane of glass
2644.74 -> for the metrics for this system is a single pane,
2648.25 -> it's a single pane that's like seven stories tall,
2651.01 -> and you'd need a skyscraper to project it on,
2653.14 -> but that's sort of the point.
2654.79 -> And some of the conversation here
2656.71 -> is about how the system functions
2659.5 -> and how we're measuring that, so the internals,
2661.54 -> you can see the lambda's there, interesting,
2664 -> and this is an interesting tie in to what Katja said
2666.97 -> about MQTT 5 and shared subscriptions,
2669.34 -> which we didn't have
2670.27 -> because we did this project over the last year and a half
2673.27 -> and rolled it out, our release was in mid-October.
2676.84 -> We created Lambdas for those outgoing messages.
2679.75 -> The previous system was using a JMSQ.
2682.66 -> So that was another approach
2684.43 -> when you don't have those shared subscriptions,
2686.14 -> and the JMSQ, if you're familiar with those,
2688.87 -> it has a mechanism for accepting a message,
2693.22 -> and then if there's some problem
2694.63 -> as you're communicating with the rest of the cloud,
2696.58 -> you can basically abort the transaction,
2699.31 -> and someone else will pick up that message.
2701.44 -> But in our mechanism, that's making an http call,
2704.8 -> if that http call fails,
2706.54 -> whether it's on that line
2707.68 -> or something downstream of the box there,
2709.78 -> especially look at the TPG,
2712.63 -> we don't have a retry mechanism.
2714.37 -> So we put that into an SQS queue
2716.95 -> so that we can build our own retry mechanism on that side
2719.71 -> as a way to handle that,
2720.82 -> prior to the introduction of shared subscriptions.
2724.87 -> All right, so there's a conversation
2726.31 -> about what is it that causes challenges.
2728.53 -> Before we had some scaling challenges
2730.18 -> that we think are fixed,
2731.62 -> and the skeptical folks on the call with our heroes ask,
2736.667 -> "Well, you fixed it in AMAP,
2738.013 -> "are you sure that you fixed it in EMEA?"
2740.86 -> And we say, "Yeah, we practice infrastructure as code.
2743.117 -> "We've rolled out the same infrastructure configurations
2746.237 -> "to both areas so that those scaling issues,
2748.457 -> "we're pretty darn confident that that's done."
2750.34 -> That's the benefit that we're looking for from that.
2753.07 -> Okay, some more questions.
2754.99 -> It's looking good for our heroes.
2756.31 -> Then they ask, "How fast are you gonna scale this up?"
2760.84 -> Because in in the Americas,
2762.88 -> we scaled up over the course of a week,
2764.86 -> we started at 1%, and 10%.
2766.99 -> And then as our confidence grew, we went up,
2769.06 -> and there was a couple cases where we came back
2770.74 -> and said, "Well, we think maybe there's a problem,"
2773.05 -> you wanna address things when there might be a problem
2775.18 -> rather than to definitely have an issue.
2779.77 -> And we said,
2780.603 -> "Yeah, we're gonna scale it from zero to a hundred
2782.897 -> "in an hour."
2783.85 -> That's what we're gonna do this time around.
2785.14 -> Our colleagues laugh,
2786.46 -> our heroes don't, they're serious.
2788.47 -> That's the plan, 0 to 100 in 60 minutes, (chuckling)
2791.47 -> because one of the other quirks
2793.3 -> that we learned in doing this cutover is our ability,
2796.66 -> since we're not changing the car,
2798.28 -> our ability to change the vehicles between brokers
2801.28 -> is just with DNS.
2803.53 -> So when we go to 50% on DNS,
2806.47 -> that means it's a coin flip,
2808.45 -> when the car reconnects, which broker you connect to.
2813.1 -> Brokers are stateful systems,
2815.05 -> it doesn't behave very well
2816.76 -> if you send a message on one broker,
2818.83 -> get disconnected because you drove through a tunnel,
2821.59 -> and then reconnect to the other broker
2823.15 -> and look for that message,
2824.08 -> that was an artificial source of errors.
2826.12 -> So we think because we practice infrastructure as code,
2829.69 -> that we've addressed the issues that we've had before,
2832.45 -> and that will give us a cleaner data set.
2835.93 -> We spend a lot of time looking at things and saying,
2838.217 -> "Is it really, there's errors here, but are they really?"
2841.21 -> And looking at those door unlocks.
2843.7 -> So what are the differences that we know?
2846.88 -> We went through our key metrics,
2848.71 -> and one fun detail here is
2850.99 -> the other kind of concrete end user metric
2853.84 -> that we tracked for the Americas in Pacific
2857.05 -> was remote starts,
2859.18 -> it was summertime, but people were still doing it,
2860.95 -> so that was nice.
2862.45 -> And while we were doing that we said,
2864.737 -> "Well, as we look forward,
2865.787 -> "what's the rate of remote starts
2868.607 -> "that we're looking at in EMEA?"
2871.18 -> And it's nearly zero.
2873.13 -> Anybody in the audience know it?
2875.2 -> Remote starts illegal in most of Europe,
2877.75 -> so that's not a metric (chuckling)
2879.31 -> that's gonna translate from one to the other,
2881.23 -> but the rest of them still look good.
2882.52 -> And we have all of our metrics from both systems,
2885.28 -> from the previous broker and the current IoT core
2888.4 -> in Datadog.
2889.39 -> So we can see them together,
2890.898 -> and we can see here's the traffic from this week.
2892.66 -> It's always fun working in automotive where you have to say,
2894.437 -> "Do you mean traffic
2895.577 -> "or do you mean traffic, traffic, traffic?
2897.827 -> "Which kind of traffic are we talking about?"
2899.38 -> 'Cause they scale together.
2901.24 -> And we can see, "Okay, this is what the scale is this week,
2904.127 -> "this is what the scale is last week.
2905.567 -> "It's not one to one but it looks pretty good.
2907.667 -> "What's our tolerance there?"
2909.16 -> All right, so the agreement,
2910.87 -> our heroes are on the call, this is late summertime,
2913.66 -> it looks good, we should roll out,
2915.49 -> we're on board with your crazy 0 to 100 in an hour,
2919.42 -> let's do it, but we think that your team,
2922.81 -> my team is based in Seattle.
2924.79 -> We think that you should really do this from Europe.
2927.97 -> You should come to the headquarters
2929.68 -> where most of the other systems that are in the cloud
2933.55 -> from the previous slide, they're in Germany.
2936.76 -> And so you should come here and do it from here
2938.23 -> so we're able to have a fast turnaround
2939.79 -> and very little impact to our customers
2941.35 -> if something does happen, okay.
2944.65 -> Not on the call, this guy named Kevin, who is on vacation,
2948.19 -> great planning, I recommend that for all of your rollouts.
2951.43 -> And one of the teammates can't travel for happy reasons,
2955.66 -> being a new parent.
2956.493 -> So four of the heroes,
2958.09 -> Neesham, Jeremy, Brian, and Torsten book flights,
2962.2 -> they fly to Frankfurt,
2963.13 -> they drive down to Stuttgart,
2964.54 -> and they go to the heart of Benztown to do this cutover
2967.99 -> in mid-October.
2973.48 -> So in mid-October, not very many weeks ago,
2978.01 -> in a conference room, we throw the switch,
2979.72 -> and it's not a switch,
2981.28 -> it's a console keystrokes in a conference room,
2983.92 -> and at 1%, we check the error rate.
2987.25 -> It's low, it's not zero,
2988.57 -> because even at 1%, you get this broker ping ponging,
2991.57 -> we ramp up, we ramp up, it looks good, it's looking good,
2994.84 -> the airs are not there.
2996.58 -> The account level throttles are not being hit,
2999.61 -> which is one of the other things
3000.57 -> that we checked before,
3002.46 -> that we'd gone through before,
3004.95 -> our infrastructure of code is paid off,
3007.41 -> that downstream system is scaling up correctly,
3010.62 -> and we go to bed.
3011.67 -> We get woken up, but it's a false call,
3013.53 -> 'cause we have the thresholds really high, (chuckling)
3015.81 -> we go back to bed.
3019.02 -> A few days pass, everything is looking good, too good,
3022.83 -> is it gonna be okay?
3024.15 -> And then there's a realization,
3027.96 -> it looks like that some traffic,
3031.92 -> as we're looking at these rate of door unlocks,
3035.55 -> like I described, you get your curve over time,
3039.9 -> and of course, it's not exactly the same
3042.36 -> from the previous week, traffic is different.
3044.73 -> We notice, it certainly varies,
3046.71 -> and everybody has been saying that this is fine.
3049.35 -> But as we take a closer look into the details,
3052.14 -> we notice that this week's traffic
3054.24 -> has never exceeded last week's traffic.
3058.86 -> That means that there, we think, are some transactions
3064.89 -> missing or failing or not happening,
3066.96 -> because you should get that variance,
3068.64 -> you know, the two scribbles,
3069.99 -> as they go along, you should get that.
3071.85 -> So what's happening and if it really is a problem,
3075.21 -> that means we would need to do a rollback.
3078 -> And at the point where we're seeing this,
3080.01 -> we've got about a day
3081.03 -> before our heroes need to get on their flight.
3083.13 -> So the question is, "If we have to do a rollback
3084.997 -> "and the context for that rollback right is,
3087.667 -> "is this whole thing worth it?
3089.077 -> "Is is six, seven figures of cost savings and hope
3092.437 -> "worth this kind of disruption to our customers
3094.477 -> "which we're very fiercely advocates for?
3098.527 -> "Have we gotten the surgery right?"
3099.84 -> So we look at these downstream systems again,
3102.39 -> is there some error rate?
3103.41 -> No, the error rates are not there.
3105.3 -> We're not seeing.
3106.65 -> Is it a problem as we're crossing the cloud?
3108.75 -> It's not, we're not seeing the timeouts.
3110.7 -> We definitely have positive data there.
3113.28 -> We go back to certificates.
3116.88 -> Are we getting rejected at the front door?
3118.89 -> And a challenge for us just in terms of transparency
3122.28 -> is that the benefit that we want,
3124.35 -> we don't run anything on the front door.
3126.27 -> AWS handles this line from TCU to the green box
3129 -> totally without us,
3129.9 -> we don't have some sort of a five networking appliance
3132.03 -> or something in there that we don't have to deal with.
3134.19 -> But that means that it's AWS's general infrastructure
3137.28 -> that's handling that certificate,
3140.16 -> and we don't really see the certificate errors
3141.87 -> if they were gonna happen,
3142.8 -> like this is a thing that we wouldn't have visibility into,
3145.62 -> because it's gonna be full of like,
3147.277 -> "I don't know, Russian hackers and all that kind of stuff,"
3149.82 -> they're taken care of for us, that's the point.
3151.56 -> So we reach out to our teammates on the AWS sign.
3156.72 -> We say we need to know,
3157.98 -> we've gotta figure out if this is the problem,
3159.75 -> because if we don't find this, we've gotta roll back,
3162.48 -> and we're running out of hours
3163.65 -> before our heroes have to get on the flight.
3166.14 -> So we turn on some additional logging,
3171.48 -> we get that widget in enabled
3174.03 -> so that we can get that temporarily.
3177.09 -> There's no certificate failures, it's not a problem.
3179.34 -> It's in within normal parameters.
3181.68 -> So what's the issue now?
3183.03 -> We're really running out of time.
3186.03 -> And we remember that one aspect of that line
3189.27 -> involves your friend and mine IP whitelisting,
3192.72 -> which I've put into place and regretted so many times,
3197.22 -> like that would match,
3198.93 -> because we're not seeing errors,
3200.1 -> we've checked every line,
3201.42 -> and there's more that aren't on this slide
3203.52 -> for some sort of error rate,
3205.02 -> some sort of message that's being sent incorrectly.
3207.09 -> We checked for if the cars are publishing on a topic,
3210.48 -> which you can see in between,
3212.13 -> are they publishing on some sort of unexpected topic
3214.41 -> that's getting rejected for the security system
3216.39 -> that's making sure that I can't steal my cert out of the car
3219.3 -> and unlock yours with it?
3221.01 -> No, that's all working fine.
3222.87 -> IP whitelisting would explain it,
3224.25 -> but here's the skeptical part,
3226.71 -> because when we talk to our other teammates,
3230.07 -> is it IP whitelisting?
3231.15 -> 'Cause that would be not our fault,
3234.577 -> and that's wonderful news for us. (chuckling)
3237.15 -> So the first thing we need to do
3238.5 -> is we need to check to see if we have all of these blocks,
3241.98 -> what's the representation?
3242.97 -> We do the data mining through the system.
3245.01 -> We get that out of the logs and in Datadog,
3248.07 -> and we do discover that there is a block missing,
3250.65 -> there is a hole.
3251.91 -> When we graph the data that way, there is a hole,
3255.33 -> an IP size hole in the number of requests
3258.12 -> that are or in the sort of distribution of IP addresses
3262.77 -> that are coming in.
3264.03 -> So now everybody's like blood runs cold,
3266.28 -> and there's four of our heroes
3267.48 -> huddling over the single laptop
3268.77 -> to see, "Did we request this block
3271.417 -> "when we requested the IP whitelisting months ago?"
3276.345 -> 'Cause it's a very long lead process
3277.68 -> for the folks who are involved.
3280.74 -> Is the block in the email?
3282.63 -> That's where we're at. (chuckling)
3283.463 -> We've switched out of Datadog
3285 -> and our highest statistical analysis,
3287.25 -> and now we're control in Outlook.
3289.41 -> It's there, okay, it's not our fault but it still happened.
3292.53 -> We reach out to our partners.
3293.76 -> How quickly can you turn this around because it's Friday,
3297.66 -> the flight is at 6:00 PM in Frankfurt.
3301.32 -> Our heroes have to drive from Stuttgart area
3304.11 -> up to Frankfurt.
3305.85 -> Can you get this in?
3307.71 -> And what minute of this hour can you do it in?
3310.77 -> That's the question.
3311.94 -> We get through to the right people,
3313.11 -> we've got the right escalations, they find the issue,
3315.27 -> they've got it in, okay,
3317.55 -> but we have to get in the car,
3319.02 -> literally, you have to get in the car,
3320.58 -> and we know that there is nothing in life
3323.04 -> guaranteeing you that you have one problem at a time.
3326.49 -> So we have to see this, and it's not instant.
3329.43 -> We don't get to see errors go away.
3330.81 -> We just have the squiggly line,
3332.55 -> gets a little closer to the other squiggly line,
3335.37 -> it's gonna take a couple of hours to even know.
3337.65 -> So our heroes get in the car,
3340.02 -> and they've got the laptop on the hotspot,
3342.96 -> and one person is driving in the backseat.
3346.5 -> We're looking to see if these come together,
3348.24 -> because if this really is the fix,
3350.82 -> then we have done a rollout in EMEA,
3354.39 -> where we scaled up 0 to 60 in 60 minutes
3357.66 -> or 0 to 100 in 60 minutes.
3360 -> And this will have been the only issue
3361.92 -> in the entire rollout.
3365.49 -> Halfway between Stuttgart and Frankfurt,
3368.37 -> there's a picture of four good friends
3371.43 -> stopped at a convenience store,
3373.83 -> and they all are holding paper bags,
3377.61 -> and there are smiles on all the faces (chuckling)
3380.64 -> as we've exceeded the previous week's numbers
3383.73 -> for the first time, and it's completed,
3386.07 -> and they fly home as conquering heroes,
3388.89 -> being able to put this all together.
3391.68 -> So that's our story of doing the cutover.
3399.99 -> We put it together, we did it through the Americas,
3402.78 -> and we have it in EMEA.
3408.63 -> And what we're looking for in the future
3411.99 -> is to be able to get a better view of this data
3415.65 -> as we go along,
3416.483 -> especially integrating better with our in-house tools.
3419.01 -> So we did these kinds of challenges
3422.19 -> through a lot of Datadog
3423.54 -> and by being the team that built it.
3425.4 -> So one of the takeaways for this is that if you are not us,
3429 -> this is really hard to do.
3430.38 -> It's really hard to figure out what's going wrong.
3433.26 -> And we get a lot of calls still on
3435.18 -> in the Americas and Pacific of,
3438.247 -> "Oh, we think the the hub is down,
3440.257 -> "we think all the message are broken."
3441.99 -> And we say, well there are several million cars
3444.15 -> that are successfully communicating.
3446.34 -> Yours is not.
3448.92 -> Statistic suggests that the problem is not in the broker.
3452.64 -> And that's a clever thing to say.
3454.98 -> It doesn't make you very many friends,
3456.75 -> because they're side like,
3457.627 -> "Well, great, but I it doesn't help me.
3459.757 -> "You're correct but not helpful."
3461.49 -> So where do we go from there?
3462.81 -> And being able to have much more of this integration
3465.63 -> and being able to do things
3467.43 -> to leveraging more of that pub sub architecture,
3470.34 -> we think we can split that off into some debug streams
3473.94 -> and be able to give people much more a view of the traffic
3477.51 -> and especially the traffic of just their cars.
3479.64 -> Almost no one should be able
3480.72 -> to see a live view of traffic for all cars,
3483.54 -> but if you can give you just your test card,
3485.34 -> that's gonna be really powerful.
3487.62 -> The streaming architecture,
3489.33 -> especially looking at some of the MQTT 5 aspects
3492.99 -> are gonna be powerful,
3494.82 -> and again, that pub sub architecture in there
3497.25 -> is gonna be really useful
3499.5 -> as well as being closer to other Amazon tools like Kinesis
3504.15 -> and being able to do something with that
3505.56 -> where we didn't have that possibility on the other side.
3508.44 -> And if we choose to migrate more of the services
3511.77 -> that are on the far side of the cloud
3513.33 -> to this side of the cloud,
3514.53 -> that's gonna be much more straightforward,
3516.06 -> because we've shown
3518.07 -> that we can get the data to the other side,
3519.87 -> and we've got it on this side.
3521.34 -> So that's our hope.
3523.44 -> And I think that's all for me.
3526.53 -> I'll hand this over to Mike.
3529.121 -> (audience applauding)
3534.54 -> - So that's all, folks.
3536.88 -> Thank you for your time.
3538.05 -> If you have any questions,
3539.49 -> we're gonna be standing right by the side of the stage
3542.58 -> for any follow up topics you may have.
3545.55 -> If you wanna see, by the way, some demos around IoT Core,
3549.87 -> FleetWise, or anything else,
3551.37 -> at Scissors Forum, we have a number of kiosks
3554.16 -> that you can talk to some of the people there.
3557.01 -> Thank again for your time, hope you enjoy.

Source: https://www.youtube.com/watch?v=Oaw_cpLBpoI