AWS re:Invent 2022 - [NEW LAUNCH!] Privacy-enhanced collaboration with AWS Clean Rooms (ADM305)

AWS re:Invent 2022 - [NEW LAUNCH!] Privacy-enhanced collaboration with AWS Clean Rooms (ADM305)


AWS re:Invent 2022 - [NEW LAUNCH!] Privacy-enhanced collaboration with AWS Clean Rooms (ADM305)

In this session for developers and analysts, get a first look at how AWS Clean Rooms can help you more easily collaborate with your partners without sharing raw data with each other. Hear from AWS experts and customers on how you can use AWS Clean Rooms to create your own clean rooms in minutes, add participants, and start analyzing your collective datasets. You’ll learn how AWS Clean Rooms helps you protect consumer data and add restrictions on queries run by each AWS Clean Rooms participant with built-in, customizable analysis rules and privacy-enhancing controls.

Learn more about AWS re:Invent at https://go.aws/3ikK4dD.

Subscribe:
More AWS videos http://bit.ly/2O3zS75
More AWS events videos http://bit.ly/316g9t4

ABOUT AWS
Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts.

AWS is the world’s most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

#reInvent2022 #AWSreInvent2022 #AWSEvents


Content

1.56 -> - Hi everyone and welcome to the breakout session
4.05 -> on Privacy Enhanced Collaboration with AWS Clean Rooms.
8.76 -> We will be discussing AWS's new Clean Room service
12 -> that Adam announced yesterday during his keynote.
16.652 -> I'm Shaila Mathias, Senior Business Develop Manager
19.98 -> for AWS Clean Rooms focused on advertising and marketing.
24.21 -> I'll start this session with an overview of the new service,
27.48 -> what it is, why it was created, how it works,
31.08 -> the use cases will support for our AWS customers.
35.04 -> Next, Ankur Agarwal,
37.05 -> principle product manager for AWS Clean Rooms
40.17 -> will walk you through a demo
41.46 -> so you can see the service in practice.
44.64 -> After we will hear from an AWS customer, Comscore,
48.12 -> a leader in media measurement and analytics.
51.48 -> Brian Pugh, Chief Information Officer
54.51 -> will share how he sees clean rooms fitting in
56.88 -> and solving challenges for Comscore and their customers.
60.78 -> Last, we will wrap up
62.1 -> and share more information on AWS Clean Rooms.
67.054 -> Let's start off with a quick audience poll,
70.26 -> get everyone moving a bit more after three days at Reinvent.
74.52 -> My first question is,
75.96 -> whose company has faced challenges in securely collaborating
80.01 -> on data with entities outside of your enterprise?
83.7 -> And if you can just raise your hand.
86.22 -> Okay, I think I see 75% of hands up.
89.73 -> My second question is, who knows what a data clean room is?
96.63 -> Maybe less hands than that, 50%.
99.24 -> My final question is, whose business has tested
102.24 -> or is currently using a data clean room?
106.86 -> Maybe about 15 hands.
108.72 -> So I think we have the right audience for this session.
112.68 -> I'll begin with the AWS landscape,
115.295 -> the landscape AWS customers are currently navigating
119.04 -> as they wanna collaborate with on shared data
122.07 -> with a vast amount of partners,
123.81 -> but face significant challenges.
127.32 -> Data is siloed across business units within an enterprise
130.98 -> with different standards and approaches being developed.
134.4 -> This creates interoperability and scale challenges
137.61 -> in being able to incorporate past insights
140.64 -> into future business goals.
144.96 -> And companies have an unprecedented amount of data.
148.92 -> According to market intelligence firm IDC,
151.803 -> it is estimated that the amount of data created
155.01 -> over the next five years
156.793 -> will be greater than two times the amount of data created
160.5 -> since the advent of digital storage.
165.06 -> And while consumers value relevant experiences,
168.66 -> companies are faced with challenges
170.7 -> on how to better manage how data is collected, stored,
173.58 -> and used to protect consumer privacy.
178.95 -> As an example,
180.06 -> an advertising and marketing customer came to us
182.91 -> and told us they wanna better understand
185.34 -> their advertising effectiveness
187.29 -> and customer behavior by running analytics on data they own,
192.24 -> combined with their partners data
194.46 -> without either party revealing
196.86 -> or sharing their raw data with the other.
201.72 -> In light of these challenges, we created AWS Clean Rooms.
206.927 -> AWS Clean Rooms helps advertising financial services
210.6 -> and healthcare companies,
212.37 -> easily and securely match, analyze,
215.52 -> and collaborate on combined data sets
218.415 -> without sharing or revealing underlying data.
222.48 -> With AWS Clean Rooms,
224.04 -> customers can create a secure data collaboration in minutes
227.94 -> and collaborate with any other company on the AWS cloud
232.11 -> to generate unique insights around advertising campaigns,
236.1 -> investment decisions, and research and development.
241.59 -> Any AWS customer can achieve the services benefits
245.279 -> through its unique features,
247.44 -> supporting multi-party collaboration
250.53 -> for up to five members in a single collaboration.
254.277 -> Minimal data movement
256.56 -> through direct permissioning of data tables
258.991 -> from a customer's Amazon S3 Data Lake
262.74 -> to the Clean Room collaboration.
265.5 -> Easily configurable privacy controls,
268.08 -> restricting the type of analysis allowed on data,
271.35 -> for example,
272.183 -> only allowing aggregate statistics with a minimum output.
276.93 -> The option to pre-encrypt.
278.76 -> So only encrypted data is used within the clean room,
282.33 -> even when the analysis is being run.
285.15 -> And the opportunity to automate
287.64 -> and integrate AWS Clean Room technology
290.638 -> into existing workflows
292.83 -> to create your own white-labeled clean room.
298.71 -> For the purposes of today's session,
300.54 -> we're gonna focus on how AWS Clean Rooms
302.91 -> can help advertising and marketing customers
305.52 -> given the specific challenges that brands, media publishers,
309.93 -> ad technology and measurement companies are facing.
313.5 -> Let's take an airline as an example.
315.96 -> A brand marketer at an airline
317.91 -> wants to know which of their ad creatives
319.95 -> that ran on the media publisher's platform
322.5 -> led to the most ticket sales.
325.8 -> The publisher wants to provide that insight
328.17 -> back to the airline
329.55 -> so the airline can determine which creative
331.89 -> was most effective in driving those ticket sales.
335.07 -> And ultimately the airline can invest more in that creative
339.3 -> to deliver more effective
341.19 -> and personalized messaging to their customers.
344.49 -> But neither the airline or the media publisher
346.92 -> wants this to come at the cost of consumer privacy,
350.16 -> nor do they wanna move what can be terabytes of data
353.67 -> given the time it will take and the risk in data exposure.
360.527 -> AWS is uniquely positioned to help with this customer ask
364.92 -> with 15 years experience
367.02 -> working with all the critical entities
369.36 -> across advertising and marketing.
372.51 -> Brands, agencies, advertising technology companies,
378.6 -> data and measurement, media publishers,
381.96 -> and divisions of Amazon have data workloads on AWS.
386.79 -> They are using Amazon services
388.86 -> to help with business challenges
390.81 -> around first-party data management,
393.12 -> advertising intelligence, and digital customer experience.
397.8 -> These entities
398.633 -> are already working with each other's shared data,
400.71 -> but that process isn't ideal,
402.87 -> there are compromises as it relates to data privacy,
406.05 -> security and data usability.
409.53 -> One party is sending data to the other
411.6 -> with a legal doc governing usage,
414.33 -> or companies are creating their own custom solutions
417.48 -> which take development time, resources,
419.76 -> and constant upkeep.
422.13 -> Or companies are turning to third parties,
424.95 -> which requires data movement,
426.72 -> increasing the risk of data exposure.
431.927 -> AWS Clean Rooms offers an easier, quicker
435.48 -> and more secure solution.
437.85 -> Let's talk through a high level architecture
440.34 -> of how AWS Clean Rooms works using that airline
443.55 -> and that media publisher as an example.
447.99 -> The airline wants to perform measurement analysis
450.51 -> using their own data and the media publishers
453.33 -> with all parties keeping their data
455.49 -> in their own respective Amazon S3 Data Lake.
459.51 -> In a few clicks, the airline can initiate
462.09 -> a collaboration inviting that media publisher
464.73 -> sending the invitation to their AWS account.
470.19 -> The media publisher receiving the invite can accept.
473.1 -> Once accepted,
474.12 -> they are prompted to associate data to the collaboration.
478.23 -> They can configure and associate any data tables
481.8 -> from their Amazon S3 data lake
484.29 -> specifying their privacy controls,
486.417 -> which are called "analysis rules" in an AWS Clean Room.
491.67 -> Once the analysis rules
493.41 -> including output constraints are set,
495.668 -> the media publisher can complete association of their data
498.81 -> to that collaboration.
501.54 -> The airline can configure and associate their own data
504.761 -> from their own S3 following the same process,
508.74 -> but determining their own specific analysis rules.
513.21 -> Please note that associating data to a collaboration
516.54 -> does not mean you are moving that data
518.79 -> from your Amazon S3 data lake.
521.34 -> Rather, you are giving AWS Clean Rooms permissions
524.82 -> to allow the query to run
526.329 -> as long as it meets the privacy control set
529.199 -> by each data owner.
532.65 -> Once all members in the collaboration
534.81 -> have completed association of their data,
537.128 -> the airline in this example
539.01 -> can start running their analysis,
541.2 -> the output going to their specified S3 bucket.
545.61 -> That analysis can be visualized
547.98 -> or used for further analysis.
555.899 -> AWS Clean Rooms drives business value
558.75 -> for all members in a collaboration.
562.35 -> In my example, it offers the airline interoperability
566.28 -> in analyzing their ticket sales
568.837 -> that's stored in their Amazon S3 data lake
572.07 -> against the media publishers exposure data
574.71 -> stored in their Amazon S3 data lake
577.71 -> without either party revealing the raw data to the other
581.49 -> or moving their data
582.72 -> from their own respective Amazon S3 data lake.
587.7 -> For the airline, this translates to specific insights
591.54 -> into what media placements
593.007 -> or ad creatives are driving ticket sales
596.79 -> to help the airline make more efficient
598.86 -> media investment decisions in the future,
601.44 -> benefiting the airlines and customers
603.33 -> with more personalized advertising.
607.272 -> For the media publisher, AWS Clean Rooms creates a new,
611.351 -> more secure and monetizable offering for the airline
615.84 -> and any other brand
617.34 -> or agency customer of that media publisher
620.19 -> to extract their own insights
622.26 -> without directly accessing the media publisher's raw data.
627.93 -> Although we are focused on brands
630.09 -> and media publishers in this example, AWS Clean Rooms
633.581 -> can provide business value for all AWS customers,
637.5 -> including other advertising and marketing personas.
641.22 -> You'll hear more about the benefits for other entities
644.04 -> such as measurement companies very shortly in this session.
647.61 -> I'm now gonna turn it over to Ankur for a service demo
650.58 -> so you can see some of the reviewed use cases in practice.
657.36 -> - Thank you. Shaila,
659.939 -> Hi everyone, I'm Ankur Agarwal.
662.048 -> I'm a product manager supporting AWS Clean Rooms.
665.889 -> And I'm really excited to show you
668.907 -> AWS Clean Rooms in Action.
671.04 -> For the demo today,
672.06 -> we have an airline who is launching an ad campaign
674.82 -> for its frequent business travelers
676.56 -> on a publisher's platform.
678.75 -> They're trying to glean two types of insights
681.06 -> that requires data from both the parties.
683.79 -> Pre-campaign,
684.623 -> they would like to understand how many of their users
687.831 -> that are business travelers are also active
690.63 -> on the publisher's platform recently, and post-campaign,
694.41 -> they would like to be able to understand
696.305 -> the creative performance
698.25 -> and see which creatives are leading to highest sales.
702.51 -> In order to do this,
703.65 -> they will need their data in Amazon S3 Data Lake
706.681 -> and the publisher will associate impressions data
710.04 -> and users data,
712.05 -> and the airline will associate its ticket sales data
715.68 -> and their customer data from their CRM systems.
721.763 -> To recap what Shaila demonstrated earlier,
724.59 -> we'll have four distinct steps
726.69 -> in how we will go through the demo today.
729.27 -> The first one would be that one of the collaboration members
732.42 -> would initiate a collaboration
733.92 -> and invite the other member to join the collaboration.
737.97 -> Once all the members have joined the collaboration,
741.21 -> they would need to create what is called a configured table.
744.78 -> A configured table
745.77 -> is a first class resource in AWS Clean Rooms,
749.526 -> which holds the reference to your Amazon S3 data source,
754.32 -> as well as contains analysis rules that determine exactly
758.19 -> how your data would be used inside a collaboration.
760.86 -> I'll talk more about it during the demo.
763.98 -> Once each member has created their configured table,
766.627 -> like in this case, we'll have four configured tables,
769.431 -> they would then associate it
771.279 -> with an AWS Clean Rooms collaboration.
776.031 -> Once that association is done,
778.115 -> then the member that has the permissions to query the data
782.19 -> will be able to query the data
783.63 -> using either AWS Clean Rooms APIs
787.08 -> or through the AWS management console.
789.84 -> All right, so then let's jump right into it.
794.25 -> Right.
795.083 -> So the first step is for the airline
797.52 -> to create a collaboration and invite the publisher.
800.34 -> I'm currently in the airlines account in the dark mode,
804.21 -> I'll go ahead and create a collaboration.
806.183 -> I'll give it a name and I will give it a description,
811.545 -> the same description.
814.002 -> Next, I will add the collaboration members.
816.813 -> I'm AirlineCo, and I will add SocialCo.
820.747 -> And what I would need
823.14 -> is the AWS account ID of the publisher.
827.76 -> So I have it here and I'm gonna go ahead and add that here.
832.535 -> For the purposes of this demo,
834.63 -> we have two collaboration members,
836.19 -> but AWS Clean Room
837.36 -> supports up to five collaboration members.
839.55 -> For example,
840.383 -> you can have third party data measurement companies,
842.82 -> identity providers, or even multiple publishers.
845.85 -> Each collaboration member can associate data
848.94 -> and one collaboration member can run the analysis.
851.82 -> So in the next step,
852.72 -> I will select who will be the collaboration member
855.24 -> that will be running the analysis.
857.13 -> In this case, it's the airline that's looking at
859.59 -> extracting the insights from this analysis.
861.87 -> So I'll go ahead and select the airline.
864.9 -> You can then select whether you want to enable
867.3 -> query logging for the collaboration.
869.31 -> If enabled, detailed logs of the queries,
871.59 -> including query text is sent to AWS CloudWatch,
875.79 -> which is the centralized
877.05 -> logging and monitoring service for AWS.
879.57 -> Each collaboration member gets to pick this setting
882.57 -> for their own while joining or creating a collaboration
885.75 -> so that they get their copy of logs
888 -> for the queries that reference their data.
892.47 -> I'll go ahead and enable this and say yes.
895.26 -> Finally,
896.31 -> what I have is cryptographic computing for AWS Clean Rooms.
899.3 -> So this allows you to pre-encrypt some or all of your data
903.03 -> before even associating it
904.41 -> within AWS Clean Rooms collaboration.
907.35 -> If enabled, AWS Clean Rooms will run the queries
910.35 -> on encrypted data based on the parameters
912.96 -> that are selected by the collaboration creator.
916.08 -> For the purposes of this demo,
917.67 -> I will disable this and go ahead
919.35 -> and create this collaboration.
924.06 -> So once done,
925.2 -> I can see that collaboration has been created
927.48 -> and an invitation should have been sent.
929.4 -> I'm gonna go and move ahead to the publisher account
932.76 -> in a lighter mode.
934.186 -> So let's reload this
935.88 -> to see if an invitation has indeed come through.
939.15 -> All right, I see campaign planning
941.149 -> and I open the collaboration invitation.
944.91 -> And I can see that, all the details
947.28 -> around who the invitation is coming from,
950.73 -> whether or not cryptographic computing is enabled,
953.1 -> is query log supported, and who exactly are all the members.
956.67 -> If everything looks good, I'm gonna create a membership.
960.12 -> As a collaboration member contributing data,
962.28 -> I also have an option to enable query logging.
965.1 -> If I enable it,
965.94 -> logs for queries that reference my data tables
968.16 -> will be sent to my AWS CloudWatch service,
970.65 -> which is separate from the one
971.7 -> that the airline had configured.
974.1 -> So I'll go ahead and say yes and create a membership.
978.695 -> So this concludes my first step of creating a collaboration.
982.77 -> We have a collaboration now with two members in it.
986.91 -> I can go and see it in the members tab,
989.43 -> and both the members are there.
992.37 -> You can see that there's no data in it yet,
994.2 -> there's no data that has been associated with it yet.
996.63 -> The collaboration,
998.685 -> the next step for me would be to create a configured table
1003.2 -> using the impressions data that is stored in Amazon S3.
1006.5 -> I'll go ahead and select that
1008.197 -> and I can select the glue catalog
1011.9 -> that is used to generate the schema.
1014.54 -> I'll go ahead and select this.
1016.25 -> I'll look at,
1017.209 -> I can also view the schema right from within the console,
1020.66 -> so I'll go ahead and look at that,
1022.46 -> it contains the identifier, impression state ID,
1025.58 -> all of this looks good,
1028.07 -> and I'm gonna go ahead and move to the next step.
1032.877 -> I also have an option to allow,
1034.76 -> list all the columns in my underlying AWS glue table
1037.363 -> or only allow list a subset of these.
1040.31 -> This allows me the flexibility
1042.23 -> that I can use the same underlying AWS glue catalog
1045.05 -> without having to create a new table
1046.79 -> for every single combination of the columns.
1049.19 -> For the purposes of this demo,
1050.399 -> I'm going to select
1051.899 -> a lot of these including creative ID campaign.
1055.79 -> They look good.
1057.283 -> Impressions ID is something
1058.7 -> that's actually internal to my system,
1060.5 -> so I'm not gonna allow listed for this configured table.
1065.39 -> I'll leave the other details as is,
1066.915 -> and I'm gonna go ahead and create a configured table.
1070.73 -> You can see that a table has been created,
1072.8 -> but it is not yet enabled for querying
1074.87 -> and that is because we haven't yet specified
1076.88 -> how exactly it will be used within a clean room,
1079.7 -> including
1080.533 -> and what type of queries can be performed on this table.
1082.97 -> So we will go ahead and do that
1084.95 -> using the create configure table,
1087.157 -> create analysis rule workflow and I'll.
1092.501 -> So Analysis rules is configured in three steps.
1096.2 -> First you select a query template
1098.33 -> to specify what are the types of queries
1101 -> and analysis that you want to run within a collaboration.
1104.12 -> Second, you configure fine green column level controls
1107.33 -> on the data using query controls,
1109.1 -> and finally you select output constraints.
1113.875 -> AWS Clean Rooms provides two flexible query templates.
1117.26 -> The first one is of type aggregate.
1119.57 -> Aggregate template allows queries
1121.55 -> that output aggregate statistics
1123.14 -> such as counts, sum, averages on collective data.
1128.57 -> This can be used to support use cases such as reach,
1131.21 -> measurement, attribution,
1133.01 -> or even finding the overlap of the user segment,
1136.52 -> which we'll be doing in our use case today.
1140.7 -> The second is of type list.
1143.077 -> List allows queries that output roll level data
1146.487 -> for the overlap between the data tables
1148.43 -> from different collaboration members.
1150.5 -> The list can be used to enrich data
1152.24 -> with additional attributes
1153.47 -> by combining it with a common match key
1155.69 -> or output a list of IDs that can be used for activation.
1159.17 -> For the purposes of our demo,
1160.49 -> we are looking for aggregated insights.
1162.14 -> So I'm gonna go ahead and select the aggregate template.
1167.12 -> So the next step is for me to configure query controls.
1170.06 -> We start with specifying exactly which functions can be used
1173.695 -> on which columns within this configured table.
1176.87 -> For my demo, I know that the airline wants to measure
1179.24 -> the size of the overlapping segment,
1181.19 -> so I'm gonna allow count distinct of that of identifier.
1184.88 -> I can also select other aggregate functions
1186.83 -> such as sum, averages, or some of the other functions,
1191.63 -> but for the purposes of my demo, I only need count distinct.
1196.4 -> Next, I can specify joint controls
1198.83 -> to control whether this table can be joined directly
1202.22 -> by the airline without them
1205.07 -> having to join with their own table,
1206.9 -> or do I only want to allow them to,
1209.416 -> or do I want to require them to have an inner join?
1212.727 -> I want to enable the airline
1214.07 -> to only perform analysis on the intersection of the data.
1216.77 -> So I'm gonna go ahead
1217.603 -> and require a join for this analysis to be performed
1221.51 -> and I will select the identifier as a jointee.
1225.347 -> So go ahead and select that.
1229.28 -> Next I can select which columns do I wanna make available
1232.64 -> to be used as dimensions in the analysis.
1235.46 -> I know that the airline
1236.45 -> wants to understand the creative performance,
1238.19 -> so they'd want to group by the creative,
1240.26 -> so I'm gonna go ahead and enable that.
1242.27 -> I also know that
1243.53 -> they'd want to be able to use impressions data as filters
1246.23 -> in their queries to better attribute
1248.36 -> and have attribution logic.
1249.62 -> So I'll enable that as well.
1253.16 -> Finally, I can also select a custom list of scaler functions
1256.407 -> or allow all supported scaler functions.
1259.64 -> I'm gonna go ahead and allow all of them for this demo.
1264.35 -> The last step in the analysis rules configuration workflow
1267.41 -> is aggregation, is specifying aggregation constraints.
1270.53 -> This allows you to automatically filter out rows
1273.08 -> that do not meet a certain minimum threshold
1276.02 -> for aggregated numbers.
1277.88 -> This can be used to further mitigate the risk
1280.1 -> that information about a small group of individuals
1282.83 -> would be released through the analysis.
1285.08 -> For instance,
1285.913 -> if you group the data by US states for a specific query,
1290.06 -> some of the largest states like California
1291.92 -> may have a large number of users,
1293.66 -> but some of the smallest states like North Dakota
1295.73 -> may have a small number of users
1297.17 -> and you may want to protect that information
1299.127 -> in the analysis.
1301.85 -> So for the purpose of this one,
1303.44 -> I'm gonna use it on identifier and select a value of 25.
1309.08 -> So that concludes the analysis rules workflow.
1313.34 -> I can review all the details
1315.47 -> and go ahead and create configured table.
1322.04 -> All right, I can see that it can now be used for querying,
1324.71 -> which means that it can now be associated
1326.6 -> to a collaboration as well.
1328.46 -> So I'm already on my third step,
1330.26 -> I'm starting to associate tables,
1331.778 -> but before I do that,
1334.52 -> I'm gonna go ahead to the collaboration.
1337.58 -> I am going to say "Associate Table".
1340.91 -> I can select my table that I just created from impressions
1344.575 -> and I can review the schema
1346.7 -> to make sure everything looks good.
1348.26 -> I can see that the joint column is indeed the identifier
1351.895 -> and I'll leave all the details as default.
1356.87 -> I'm gonna go ahead and say associate table.
1358.94 -> In the background, what AWS Clean Rooms is doing
1361.34 -> is creating a scope down IM rule
1364.975 -> to get read only access to this table
1367.61 -> only when the queries will be run.
1369.26 -> At this time there is no data that is moving.
1371.455 -> So we've associated the first table.
1377.66 -> I also created a few configured table already
1380.24 -> using the same analysis rules workflow
1382.7 -> so that I don't have to do it four times in this demo.
1385.31 -> So I'm gonna go ahead and select that.
1387.62 -> I'm gonna show you the users table,
1389.57 -> I can see that it has some attributes, including country,
1392.831 -> the analysis rules specify, identify as a joint column.
1398.57 -> Everything looks good
1399.95 -> and I'm gonna go ahead and associate this one as well.
1404.379 -> And at this time, we are just giving information
1407.99 -> to the AWS Clean Room service
1409.4 -> that this is where you go look for data
1411.08 -> when a query is actually initiated.
1414.71 -> So this association will take a couple of seconds.
1417.08 -> All right, so we are done from the publisher side,
1419.72 -> we created a configure table,
1421.34 -> created analysis rules and associated both the tables.
1426.08 -> So we are ready to go back to the airlines
1428.12 -> and I'm gonna refresh
1430.28 -> to see whether this data shows up in the collaboration.
1434.57 -> And indeed there are two tables in the collaboration,
1438.535 -> and you can see that those two have been associated.
1443.24 -> I'm gonna go ahead and associate the first party data now
1446.24 -> that I have from the airline site,
1447.65 -> including the ticket sales and the CRM data.
1450.8 -> So I have those two tables that I configured earlier
1453.474 -> and we can see the schema here
1456.62 -> and it has a lot of rich details about the price
1459.44 -> and the transaction ID and a lot of other columns
1463.28 -> that would be used into analysis.
1465.14 -> I can see the analysis rules
1466.7 -> and I can see that the joint column is again,
1469.04 -> is of type identifier
1470.323 -> and I also am allowing sum in this case on the price
1475.55 -> so that I can understand the revenue impact.
1479.03 -> So I'll go ahead and associate this table,
1482.12 -> it'll take a few more seconds
1484.7 -> and then we'll be ready to associate our loss table.
1488.48 -> All right, so we have three tables in this collaboration
1491 -> and we'll associate our fourth one now,
1493.34 -> which is going to be the airline CRM data.
1496.22 -> So go ahead and select that,
1498.38 -> and I can see that it has a lot of rich information about
1501.763 -> whether or not the user is of type business
1504.523 -> and a lot of other demographic information about the user.
1509.095 -> Again, this is my first party data.
1511.663 -> I'm still configuring analysis rules,
1513.86 -> but I'm configuring it in a way that is more permissive.
1517.451 -> So I'm gonna go ahead and say associate.
1526.54 -> Should be another second or two.
1531.92 -> All right, so we have our collaboration,
1534.26 -> all the data has been associated
1536.03 -> and since airline is the one that is running the analysis,
1539.6 -> we have a queries tab as well.
1541.37 -> So I'll go there and I can see on the left hand side
1544.37 -> right alongside my code editor,
1546.26 -> I can see that the analysis rules are configured,
1549.92 -> the airline tables do not require an overlap,
1554.72 -> because again, it's my first party data.
1556.7 -> I can see all the information about
1559.61 -> which are the joint columns, what are the dimension columns.
1563.18 -> I can see here that the publisher requires a join.
1567.05 -> And again, I can see all the details of how analysis rules
1570.38 -> have been configured for this table as well.
1573.02 -> The last step is for me to specify the S3 bucket
1575.39 -> for where I will send the output of the queries.
1580.55 -> So I have something called ACR demo that I created earlier,
1584.54 -> I'm gonna select that and we can start running queries now.
1589.225 -> So I have something that I had written earlier.
1594.62 -> So let's try this particular query.
1598.16 -> Why don't we try to
1599.617 -> output a list of identifiers from this analysis?
1603.44 -> So I'm gonna try to do that and I'm gonna try to stay
1606.393 -> by joining the two tables of customers and user.
1611.3 -> As you can see,
1612.133 -> it has been immediately rejected by AWS Clean Rooms
1614.75 -> and that is because the analysis type is a type aggregation
1618.44 -> and we are trying to output role level information
1620.9 -> in this query.
1622.837 -> Let's try to do something else.
1625.79 -> Let's try to query just a publisher's table.
1628.61 -> So I'm gonna comment out the other part
1630.457 -> and I'm gonna try to find the count distinct,
1633.23 -> which is an liable operation on the identifier.
1636.95 -> And I'm gonna try to run this without a join.
1639.989 -> Let's try to do that.
1642.23 -> And when we do that, again, it has been immediately rejected
1647.36 -> and that is because a join was required by the publisher
1650.665 -> and this can only be,
1652.329 -> only the queries that have a join will be allowed
1656.197 -> in this collaboration.
1658.1 -> So let's try something that might work.
1659.96 -> Let's try count distinct.
1662.12 -> Okay, we need to fix that syntax error.
1664.94 -> But as soon as we do that, we run it again
1666.89 -> and we can see that the query has been accepted
1668.69 -> and it's running
1669.86 -> and it's gonna take some time for it to run completely.
1672.14 -> So what I'm gonna do is
1673.07 -> I'm gonna switch to another collaboration
1675.08 -> that has results from the same queries
1677.224 -> and a couple other queries run from earlier.
1681.21 -> As you can see here, this is a number of users
1684.65 -> that are common between
1686.538 -> frequent business class travelers on the airline
1690.8 -> and those who were recently active
1692.18 -> on the publisher's platform.
1693.23 -> So this gives me an idea
1694.67 -> what my addressable user segment size is
1697.88 -> for this particular campaign.
1699.38 -> As you can see your use,
1701.75 -> you can again see all the information
1704.09 -> from the analysis rules window.
1707.111 -> I can also use other types of queries,
1710.09 -> like I can group by the creative
1711.74 -> and I can run the analysis by joining the impressions table.
1716.452 -> And here I can see that some of the value oriented ones
1720.83 -> are not working that well in terms of driving sales,
1723.77 -> but the aspirational one is working particularly well.
1726.558 -> So this kind of helps me inform
1729.329 -> the types of creatives that I want to build.
1732.177 -> I can also, for instance, do the sum of price
1736.19 -> to understand what was the total impact of the revenue
1738.59 -> that was generated as a result of this ad campaign
1741.29 -> by better attributing it through a where clause
1744.44 -> where I'm making sure
1745.28 -> that the impression occurred before the sale was made,
1748.4 -> and I can get more specific there as well.
1751.19 -> So analysis rules really allow you to
1753.26 -> write a lot of different types of queries
1755.24 -> within the constraints of what's defined.
1756.68 -> And because the data is an S3,
1758.36 -> you can really very easily plug it
1760.43 -> into a AWS analytics tools such as Amazon QuickSight
1765.08 -> and SageMaker.
1766.924 -> So here I've plotted the graph of the impressions
1771.62 -> and the sales that they're driving on a daily basis.
1775.55 -> And this really helps me
1777.05 -> better visualize the relationship between the two.
1780.11 -> I can also easily visualize
1781.624 -> how the creatives have been performing.
1784.04 -> I can slice it by daily, weekly numbers
1787.28 -> to understand how the creatives have been trending.
1790.13 -> Assuming the same exposure, I can see that certain creatives
1793.82 -> are doing better than the others.
1795.56 -> So that really concludes our demo today.
1800.863 -> These are just some of the examples
1803.48 -> of how you can use AWS clean rooms
1805.61 -> to extract insights from collective data
1808.148 -> without sharing raw data
1811.328 -> or moving it outside your AWS account.
1815.465 -> We are really excited to put it in your hands
1818.18 -> and hear about all your interesting use cases.
1820.91 -> Although I went over the console today,
1823.22 -> the entire process can be automated
1825.77 -> using every operation that I performed on the console today
1830.18 -> as it will be available through AWS SDK, through APIs.
1834.05 -> And this entire process can be automated using AWS ETL tools
1838.34 -> for automated data ingestion.
1840.68 -> You can use AWS clean Rooms APIs
1843.11 -> to create and manage collaborations and run the analysis
1847.22 -> and you can easily feed it into analytics services
1850.01 -> such as Amazon SageMaker for machine learning
1853.16 -> or Amazon QuickSight for better visualization
1855.728 -> or Amazon Redshift for advanced analytics.
1859.91 -> We are really excited to hear about
1861.26 -> all the things that you would do with AWS Clean Rooms.
1863.708 -> Thank you for being here.
1865.222 -> I would now like to invite Brian Pugh from Comscore
1869.048 -> to talk about industry trends
1870.95 -> and share about some of the ways
1872.36 -> that they're excited to use Clean Rooms.
1878.54 -> - Thank you.
1882.53 -> Am I on?
1885.29 -> Can you hear me okay?
1886.43 -> All right.
1888.14 -> Hi everybody, nice to see everyone here.
1890.48 -> Thanks Ankur, and it's a pleasure to be here.
1893.69 -> Data clean rooms
1895.046 -> are an area that I have a lot of interest in.
1898.91 -> I'm CIO at Comscore,
1900.95 -> I'll talk a little bit about what Comscore is
1903.728 -> and some of the changes that we're seeing.
1906.47 -> I mean, there is massive change happening
1909.26 -> very quickly in the media space,
1912.86 -> and that's where Comscore operates.
1914.87 -> We'll talk about a clean room use case that we designed
1918.156 -> that is very illustrative
1920.51 -> of how we would use this type of technology.
1923.39 -> And then the types of offerings
1926.45 -> that data rooms can help us implement.
1930.44 -> So who is Comscore?
1932.909 -> We are a media ratings company,
1935.84 -> so we provide different ratings
1940.58 -> around how many people are visiting websites,
1943.37 -> what is the total reach of a advertising campaign,
1947.206 -> how many people viewed television?
1949.25 -> We're a digital ratings and TV ratings company
1951.86 -> and also movie ratings.
1953.57 -> And if you read about like what
1955.49 -> the top movies were this past weekend,
1957.848 -> it's probably Comscore numbers that we're using.
1960.52 -> We are in the business of measurement
1964.13 -> and our data's used for things like
1966.77 -> planning an advertising campaign
1969.11 -> or how can I possibly reach the best audience,
1972.23 -> we're a neutral third party that can provide those insights
1975.23 -> between advertiser buyers and publishers
1978.98 -> who are selling inventory
1980.87 -> and then evaluating that was a campaign successful
1984.62 -> that I reached my audience successfully.
1988.7 -> The thing with measurement though, with all of the,
1992.99 -> I mean, it's huge, the most media used to be
1996.41 -> before the internet was TV and radio
1999.5 -> and you didn't need a whole lot of data
2002.25 -> or as much data to be able to measure,
2004.69 -> but there's so much cardinality, so many niche audiences,
2007.84 -> so many different ways to consume media
2010.036 -> that collaboration really is how we work with the industry
2015.49 -> and in order to be able to provide measurement.
2018.916 -> So we know media companies
2021.7 -> capture a lot of data about their audiences
2024.092 -> and they have analytics data
2026.02 -> about people interacting with their websites.
2028.876 -> We integrate data with MVPDs for television,
2032.83 -> and we have billions of measurable events
2036.79 -> and impressions every single day that we ingest to measure.
2041.476 -> And the collaboration piece is absolutely critical,
2045.58 -> but one of the things that's happening
2047.88 -> is more awareness around how that's being done
2052.93 -> and how measurement is happening
2054.28 -> and people don't want to be tracked and things like that.
2056.59 -> And that's really where data clean rooms can help out.
2061 -> We're doing an audience analysis,
2062.716 -> what type of audience brings context to the measurement?
2066.1 -> It's not just counting impressions.
2068.468 -> How many, are you reaching your target audience?
2071.26 -> Are they auto intenders?
2075.092 -> Are they intending to buy a car?
2076.15 -> Are they mothers?
2078.4 -> Those types of things for demographics
2080.56 -> are really important to advertisers.
2082.81 -> And then how do you reach those people?
2084.79 -> Where do they spend their time?
2085.99 -> Where are they engaged?
2087.67 -> Where are they most likely
2088.81 -> to interact with an advertisement?
2092.344 -> The data we ingest is pretty much,
2097.068 -> if you wanna interact with different websites,
2099.22 -> and I'll show an example in a second,
2100.66 -> there's a measurement that's sent to Comscore,
2103.16 -> that is the collaboration.
2105.13 -> We work with different media companies
2106.908 -> and they integrate Comscore measurement
2110.65 -> and that measurement is ingested,
2113.05 -> and we use that to create insights.
2118.87 -> And we combine this with first party data that Comscore has,
2121.9 -> panel data, we recruit people around the US and the world
2127.78 -> to interact with us as part of the panel.
2131.17 -> We offer incentives, we get to monitor the behavior
2134.11 -> and then we integrate that with all this other data,
2136.954 -> which I'll call census data.
2139.78 -> That census data is data we're getting from media,
2142.354 -> from different media companies.
2144.88 -> That is how we provide the context.
2147.1 -> That is the data that we're joining together.
2149.89 -> Today we do that all on our infrastructure.
2154.184 -> But as we put these pieces together,
2157.57 -> there's some sensitivity
2159.04 -> around how that's been done in the past
2160.9 -> and how we need to move forward.
2162.94 -> We believe at Comscore,
2164.23 -> that privacy is a right that should be supported.
2167.341 -> So if there are things that,
2170.29 -> artifacts of advertising technology
2173.08 -> that consumers aren't uncomfortable with,
2175.06 -> how do we adapt measurement
2176.77 -> so that we can provide to our customers what they need
2180.31 -> to be able to do media planning, evaluate audiences
2184.204 -> and respect consumers as far as
2187.082 -> what they would like to have happen
2189.352 -> when they engage with these different media properties.
2194.049 -> So this is the way,
2196.48 -> what we call unified digital measurement works today.
2199.21 -> We have a panel, we can observe everything that's happening
2202.356 -> for integrated devices on that panel.
2205.42 -> It's all opt-in, people join it.
2207.94 -> And then we have our census network,
2210.31 -> which are measurements that are integrated
2212.56 -> all over these different media properties.
2215.23 -> And we actually merge these two,
2216.79 -> we have an intersection of what our panel sees
2220.6 -> and what it comes through the census data.
2222.07 -> It's a tiny percentage of all that data,
2224.59 -> but we get all of the census data as well.
2227.2 -> We add that up, we have methodologies,
2229.66 -> algorithms that turns that into a rating.
2232.6 -> We use our panel
2234.25 -> and other data sources to tell us the context,
2237.07 -> what type of people are interacting with that media.
2241.27 -> And then this is the type of report that we provide.
2244.15 -> The scale is huge, right,
2246.07 -> we're basically trying to measure ratings,
2247.81 -> audiences for the entire internet.
2250.214 -> So that's why we're ingesting so much data.
2253.879 -> We're reporting on 10s of 1000s of websites,
2257.68 -> we're reporting on all national local television.
2260.99 -> And so that's where the integrations,
2263.59 -> a panel cannot do this on its own.
2265.324 -> So we collaborate, we work with our media partners.
2269.26 -> They want us to work with them
2271.51 -> so that we can provide
2272.5 -> the most accurate measurement possible.
2275.29 -> And we're also a neutral third party,
2277.6 -> we're treating everybody the same.
2279.19 -> That's really important as far as media goes.
2283.63 -> Buyers want to trust that the measurement
2286.18 -> is representing everything without bias.
2291.19 -> So use cases for buyers or agencies,
2295.96 -> and advertisers reaching their target audiences,
2300.19 -> evaluating different sites, competitive sets,
2302.62 -> those types of things.
2304.33 -> And expanding their reach,
2305.71 -> like how do they reach the most possible people
2308.5 -> for an advertising campaign.
2310.72 -> And then for publishers that's really around benchmarking.
2314.464 -> What is the size of the audience?
2316.54 -> How do they compare to the rest of their category,
2320.05 -> or who they're selling against?
2321.16 -> How do they show that their audience has a lot of value
2325.99 -> so that they can increase the value of their inventory?
2331.18 -> So what's changing?
2334.459 -> Measurement needs to be privacy forward
2338.53 -> in that things like cookies and mobile ad IDs
2344.02 -> are advertising technology artifacts
2347.74 -> that are commonly used in measurement.
2349.3 -> They're used for analytics, they're used for integrations,
2353.53 -> they're used for counting.
2355.552 -> And Comscore has used those in the past,
2358.822 -> that has been a trust
2360.79 -> or a way to be able to measure these things with accuracy.
2365.74 -> But there is sensitivity, right?
2367.96 -> The consumers, governments have started to realize
2373.33 -> that there is a lot of power in that data
2375.49 -> and it needs to be protected.
2377.86 -> It cannot be misused.
2379.99 -> So not all measurements opt in, right?
2383.56 -> So how do you make it so that we can work with our partners
2387.37 -> and they feel trusted or they feel that the data
2390.516 -> that we're integrating cannot be misused?
2396.07 -> Well, we do have to support collaboration.
2398.59 -> There's so much fragmentation.
2400.6 -> It is impossible for a measurement company
2404.14 -> to measure everything by itself and have it be great,
2407.83 -> collaborations where it all comes together
2410.71 -> because we can see exactly what the media companies see
2414.43 -> and we can integrate that into our measurement,
2416.44 -> and that gives the best possible result.
2420.04 -> All these integrations though, are very,
2422.56 -> there's a lot of friction,
2423.49 -> there's a lot of different integrations
2425.47 -> and we want to be interoperable,
2427.93 -> we want to go where our customers are going,
2429.996 -> but there's a lot of ETL happening, there's a lot of,
2434.23 -> if we're sharing data server to server, let's say,
2438.012 -> we're integrating at a lot of different points.
2442.24 -> And something like a data clean room,
2443.77 -> like you want it to be where our customer's data is
2447.82 -> just to reduce all that friction
2449.11 -> so we're not moving data around
2450.7 -> and having to absorb all this costs.
2453.64 -> And then that's where the interoperability comes in.
2457.328 -> How do we operate where our customers are operating,
2461.02 -> reduce all of that friction.
2465.28 -> So clean rooms can enable these things
2467.32 -> so they can enable Comscore
2468.58 -> to provide the best possible measurement,
2470.452 -> can enable our data partners
2473.05 -> to trust that the data that they're providing
2474.97 -> is safe and protected.
2478.12 -> So the way data sharing with works with Comscore today,
2481.15 -> this is just an example of a pixel tag
2483.49 -> that's sitting on our website.
2484.75 -> We have them all over the web.
2486.731 -> What it's doing is sending
2488.14 -> just a little piece of information back to Comscore
2491.05 -> that says somebody visited Comscore.com
2493.42 -> and there's an http header that has cookies
2497.47 -> and all of those IP addresses and all that stuff in it
2500.74 -> because that's how the internet works.
2503.47 -> And that's also some of the elements
2505.15 -> that we use in the measurement.
2506.98 -> We also work with a lot of companies
2508.57 -> on server to server integrations
2510.34 -> where they're pushing that type of data to us.
2512.89 -> And that's happening in a lot of different ways,
2515.23 -> FTP, S3 integrations, I mean,
2518.855 -> whatever our customers are comfortable with
2521.2 -> because we're interoperable.
2523.878 -> I will go back for a second.
2526.21 -> I mean, if you look at our whole AWS pipeline,
2529.886 -> we're using CloudFront as our CDN
2534.01 -> and we're ingesting everything through AWS
2536.47 -> and just landing in S3.
2538.143 -> And then we're running analytics
2540.07 -> on top of that to create our results.
2542.62 -> So our data is already there, right?
2544.66 -> So and in this case, it's the same thing,
2548.8 -> we're integrated data, maybe S3 to S3
2551.315 -> and then we're then putting that into our data lake
2555.7 -> and then we're creating reports
2557.391 -> or our ratings off of that on the AWS infrastructure.
2564.22 -> So in a clean room,
2566.53 -> if you could think about that server to server integration
2569.62 -> where a company might be providing data to us
2573.01 -> through an S3 bucket, and it has all those things
2576.7 -> that they might not be comfortable sharing.
2579.04 -> You know, there's a lot of data like cookies,
2585.28 -> first party IDs,
2588.07 -> IP addresses that are useful to connect to our panel
2592.01 -> so we can provide that context.
2594.076 -> But if in a data clean room environment,
2597.79 -> we can actually do that merger
2599.571 -> inside of the data clean room.
2602.44 -> And then those sensitive elements
2604.24 -> never need to leave the data partners infrastructure, right?
2611.103 -> We're gonna run an analysis,
2613 -> we're gonna get the pieces we need
2614.35 -> to perform our ratings measurement,
2616.6 -> and then that's gonna be the result.
2620.41 -> So this is how it'll work.
2622.24 -> And this was the test that we designed
2624.67 -> because we have all these elements,
2626.11 -> we have those integrations on websites
2629.17 -> and we have our panels
2630.243 -> so we can find a key that would be the same key
2635.782 -> that we might use with a media partner, right?
2638.47 -> Which would be, let's say a cookie.
2641.92 -> And then instead of us ingesting all that information
2646.51 -> and doing all the analysis all behind our firewall,
2650.38 -> we can join those things in the data clean room
2652.87 -> and then get back what we need.
2654.1 -> Let's say it's an aggregate,
2656.943 -> a good example of the type of aggregation that we would do
2661.09 -> is a media company may have demographics on their side.
2666.55 -> Comscore has demographics because of our panels.
2669.55 -> And we can join based on the intersection of our panel
2673.72 -> with the media company
2675.34 -> and the media company will have their demographics,
2678.34 -> we'll have ours, we can create a matrix,
2680.23 -> which is basically like an error correction matrix
2683.14 -> that we can use to integrate in our measurement downstream.
2686.703 -> That is one really good example.
2689.65 -> And then to get the rest of the measurement,
2692.02 -> it could be sent to us without those sensitive keys
2697.051 -> or it could be provided in aggregate potentially.
2700.33 -> So, and that would reduce the amount of data
2703.12 -> that's being ingested.
2706 -> So this is an example similar to what Ankur went through,
2711.347 -> when I looked at this, if it looked and felt like Athena,
2715.15 -> you know, you're writing SQL,
2716.32 -> you've set up a configuration based on
2719.47 -> which party can have access to what,
2721.667 -> and you can run SQL on it.
2723.97 -> And if I just added a demographic
2726.28 -> to that statement with a group eye,
2728.86 -> the demographic from the partner
2730.291 -> and from our Comscore panel,
2733.36 -> then I could create that matrix I was talking about
2735.67 -> and I never see the keys,
2736.9 -> I never see what the intersection is,
2738.43 -> I don't know which data,
2740.996 -> or I won't need to see all the data from the media partner.
2747.58 -> Or we could create a list,
2749.05 -> we could join our panel to that information,
2751.66 -> we could pull back at the row level,
2753.88 -> at the panelist level information that might be useful,
2757.15 -> but we wouldn't need like the sensitive keys
2760.383 -> associated with it.
2766.06 -> So we look at data clean rooms like a,
2769.756 -> they're absolutely necessary for the future of measurement.
2775.154 -> We want to find solutions like Amazon's data Clean Room,
2779.77 -> which where our customers already are,
2782.11 -> and then we can scale that.
2784.48 -> One of the challenges Comscore has
2786.16 -> is scaling all of these integrations
2788.751 -> and as our customers needs evolve
2792.843 -> around which data they can share or will share.
2796.547 -> And we want them to trust
2798.764 -> that the information that they're providing to us
2801.67 -> is exactly what they're willing to provide
2804.22 -> and it's gonna be used in the right way,
2806.62 -> they can set up the configuration
2808.545 -> and then we can automate ingesting that data
2814.39 -> or the results of that data clean room queries.
2817.93 -> And that will eventually reduce friction,
2820.18 -> we can standardize those things
2821.77 -> and we can approach these integrations
2823.39 -> with literally 1000s of media partners.
2827.5 -> And the types of things we can do,
2830.5 -> I've talked about validating the data,
2832.09 -> the demographic data sets, that's a really good example,
2834.91 -> something the Comscore panel does very well.
2838.21 -> We can standardize and scale these types of integrations
2842.043 -> and then we Comscore can offer things back.
2845.86 -> You know,
2847 -> we can hope with predicting what type of audience
2850.083 -> could be on the publisher site
2851.896 -> by basically sending the data back to them
2854.8 -> that we're comfortable sharing.
2855.97 -> We're also not really willing to share who our panel is.
2859.9 -> That would be bad for Comscore, right?
2862.57 -> But we can share insights back to our customers
2867.07 -> and we'd like to do that, and do that in a way that
2871.06 -> is easy for us and for them, but also provides that privacy
2875.419 -> and security that everybody requires.
2879.01 -> So.
2882.04 -> I'm gonna invite, I just kind of summed all of this up.
2886.81 -> I'm gonna invite Shilah back up to close us out.
2891.43 -> Thank you very much.
2893.747 -> (crowd applauding)
2901.57 -> - Thank you Brian for sharing perspective
2903.85 -> on how you see clean rooms fitting in
2905.83 -> and bringing value to both Comscore and your customers.
2909.318 -> We focused a lot on advertising and marketing
2911.89 -> throughout this session,
2913.03 -> but the use cases for AWS Clean Rooms can extend broadly
2916.72 -> to other industries as well, such as financial services,
2919.69 -> as well as healthcare.
2922.03 -> Before we wrap, let's review
2924.285 -> AWS Clean Rooms benefits for AWS customers one last time.
2930.49 -> Create your own clean room,
2931.9 -> add participants and start collaborating with a few clicks.
2938.05 -> Collaborate with 100s and 1000s of customers on AWS
2942.43 -> without sharing or revealing underlying data.
2947.11 -> Protect data with a broad set of privacy enhancing controls
2951.13 -> for clean rooms.
2953.44 -> And use flexible, easy to configure analysis rules
2957.4 -> to tailor your queries
2958.87 -> according to your specific business needs.
2964.42 -> The service will be available in preview in a few weeks.
2968.08 -> You can get more information on our website,
2970.72 -> the URL is listed here.
2975.76 -> And on behalf of Brian, Ankur and myself,
2977.92 -> thank you very much for attending our session
2980.14 -> and your attention and enjoy the rest of your event.
2983.283 -> (crowd applauding)

Source: https://www.youtube.com/watch?v=YxWYEeEAvv4