Punit 00:00Data Governance creates value. We all know this, but what we usually miss out is that it's not all data that needs to be governed. One of the aspects is to find out the right data. And if you focus on the right data and put the right controls on the right data and then manage it in the right way, you create value for your organization. And when you say data, you need to look at it from different perspectives, the perspective of the consumer, the perspective of the organization, and also perspective of the law. And in case you are an intermediary or a Data Broker, also from that perspective. So based on who sees data and what data it is, there is value in data. And if you are leveraging data in the right way, you are creating value for the organization and also creating trust, and trust, not only from consumer perspective, but also from the organization perspective, because organization needs the right data and needs to have the confidence that data is trustworthy or there is sufficient quality in data. Now all this gets very fascinating, because it's easy to say data creates trust. Data creates value. Data is gold, but not all data, and hence I have chosen an expert who lives and breathes data comes from a security background and is now the founder of what we call data village. So, let's go and talk to him, and I'm talking about none other than Frederick Lebeau.
Fit4Privacy 01:25
Hello and welcome to the fit4privacy podcast with Punit Bhatia. This is the podcast for those who care about their privacy. Here your host, Punit Bhatia has conversations with industry leaders about their perspectives, ideas and opinions relating to privacy, data protection and related matters. Be aware that the views and opinions expressed in this podcast are not legal advice. Let us get started.
Punit 01:54
So here we are with Frederic. Frederic, Welcome to the Fit4privacy podcast.
Frederic 01:58
Good Good afternoon. Thank you.
Punit 02:00
Thank you so much. Just a quick question to start the conversation. You work a lot in space of data, and we've been talking about data for I think almost 10-12, years, when nobody wanted to listen then the GDPR came, and more people wanted to do data then the AI came, and we wanted to say AI will. People were saying AI will kill us and everything and everything. And now, all of a sudden, we are talking about data privacy, AI security, leading us to something called trust, or digital trust. So how would you put a few words around or how you do put digital trust in a few words?
Frederic 02:37
Well let me think about digital trust in the context of data. And I think you you can look at it from two angle first, from the angle of the data holder, the data owner, or what's in the name, and from that angle, the digital trust I would define on 2 elements. First is to whom your data are shared. Would it be a company, an individual? The trust is knowing exactly to whom your data are shared and then used, and the other key element is for which purpose. That's also very key in the trust, because knowing who is using the data, that's one thing, but knowing for which purpose is, of course, the other thing that you need to know to be in trust with this, what you can call data user that's from the angle of the one that's managed the data, or on the data. On the other side, you have also the trust needed from the one that is using the data. And there it's more the trust in the data. I mean, does the data are coming from the organization or the individual are expect to be is it the data owned by that company? Does this data have not been changed, tempered, altered by someone else? Is it really the data owned by that company? So that's really better rational relationship between, I think, what you can call the data holder and the data user.
Punit 04:07
I think that's very well put, because typically we look at the consumer, on the individual who is sharing the data, and we look at the 2 dimensions as who has the data and what do they do with it, but you put a nice angle to it. If I as an organization and processing the data, I want the data to be at a certain quality, and I want to have the confidence in the data being accurate and also legitimate. So if it's a 3rd party giving me data, I want that assurance that it's not it's legitimate. Yeah, let's put it like that.
Frederic 04:35
Exactly. Yeah.
Punit 04:36
That's a very interesting view, and I am now tempted to ask you, in that case, something like because trust, as we say, has many elements, like we say, how do you look at trust? But when you look at trust, you can look at it from many angles, not looking at it. But what is trust? How do you constitute trust, like we talked about legitimacy of this data, or confidentiality or integrity of data. But. We can look at it as some components creating trust. So bottom up, maybe not looking into trust, but how? What recon creates trust? What components create trust?
Punit 04:36
That's a very interesting view, and I am now tempted to ask you, in that case, something like because trust, as we say, has many elements, like we say, how do you look at trust? But when you look at trust, you can look at it from many angles, not looking at it. But what is trust? How do you constitute trust, like we talked about legitimacy of this data, or confidentiality or integrity of data. But. We can look at it as some components creating trust. So bottom up, maybe not looking into trust, but how? What recon creates trust? What components create trust?
Frederic 05:10
Yeah, so the first obvious one that comes to my mind is control. So, controlling the data is the key element in trust. But of course, in terms of means to control the data, there are multiple means, and there are multiple technology nowadays to control the data, but keeping control over the data is a key, and I'm a strong believer in also making the link from so this notion of control and the way the data are hosted. What I mean is that to keep control, seeing as a data holder, your data moving to the data user and then copied duplicate it at the side of the data user. I think in that step, you start losing control, because you are then not in control anymore about how the data will be stored, how the data will be managed. And today, there is, and there are a lot of leakage due to third party companies, whatever, that are not managing the data that have been copied from a data holder in the right way, which then end up with leakage of these data. So, for me, one of the key element is control, and I'm a strong believer that. And it's definitely not enough implemented today, but it is implemented more and more in, typically, data platform and so on. It's about making sure that the data stays at the side of the data holder. That's, for me, the first key element a second element is the transparency and how the data user can define the way the data are used. And on the other side, when we talk about individual data subject, of course, the consent is a key element is about making sure that, as we all know, making sure that the consent has been given for that specific goal, making the thing then fully transparent and there, in terms of, let's say, component defining and describing what is done by the data through a typical algorithm descriptor or whatever, this is a key component of trust making, Let's say that data usage fully transparent. So, control transparency, and the last but not least, again, from my point of view, is confidentiality and privacy. So confidentiality and privacy, it's not exactly the same, like, like, you know, but it's, let's say, in the same can put it in the same bucket, is to make sure that the data, and especially sensitive data, high value data, proprietary data, or whatever stays confidential. And that's also very important. Typically, if a user share personal data, then privacy is key. If a company share proprietary data, strategic data, confidentiality is key, and that's, for me, the third element, or the third component of trust that needs to be implemented in any data.
Punit 08:07
I think again, that very well defined control, being both parties feeling in control on what's happening, consent being twice that both parties, or at least the user who's giving the data, has a choice, and then, yes, transparency around what is happening to my data, and where is it going, where is it flowing? That's very well articulated, I would say. And then the question would be, since you are working in the domain of data and data a lot, typically, we say data needs to be managed, or data needs to be governed, and when we talk about digital trust, it's also, while we can talk about control, transparency and confidentiality, the root thing that we manage or govern is data. So, I believe, and I firmly believe, that data governance, or data management has a role to play in this digital trust environment or creation of digital trust. So how do you see that role of data governance in this?
Frederic 09:03
Yeah, definitely, fully agree. The thing is, with data governance, I mean, it's a terminology that is used now for a lot of things, which, in a way, does not mean a lot of thing. It has been used for everything. So that's not so easy to talk about data governance, and I'm definitely not a deep specialist in that. Nevertheless, there is also another element, which I think is making also data governance very complex and expensive, or at least that the way the market and big organization are also looking at that. It's because data governance, if you look only from the governance point of view and the setup of the governance it brings over it. I mean, especially the way you see how data governance is implemented, typically starting from defining, you know, all the roles, etc, and making sure that everything is in sync with the reality a lot of admin to keep it in sync, etc. So I think today the value of data governance. Governance is not so well perceived, but it's very, very, very important, especially in the space of AI, to make sure that everything is going the right way. But I mean, it's a bit, I think, perceived today as overhead, expensive and not bringing so much value. But I'm a big fan of of of data mesh. I don't know if you you heard about data mesh, but that's okay, yeah, so that's, I'm sure you know about that data mesh, bring that perspective of data ownership and all these things, which, of course, I would not advise to apply every everywhere, like you know, all the all these API in the past and the microservices, you know, the truth was, yeah, you need to do everything in microservices. But after that wave, microservices is still there. Bring a lot of value. But it's not that all these monoliths should have been transferred into into these microservices for different reasons, and it's the same with the data mesh and the data product and the data contract. It brings a lot of value, but it should not be applied, at least when you start to everything. My My belief is that this makes a lot of sense when you connect data from entities that are not used to talk together, meaning, when you don't have data in a big data lake, okay, you have one entity, could be a department or a team or whatever, and another one, and they have both their data lake. How can then govern these data? Because today, typical data catalog and data governance is applied, you know, on different Database, Data Lake, etc, in one single organization. And that's where this concept of data mesh is bringing, I think, a lot of interesting concepts. And one of this is the data contract that brings, also, I think, a lot of value in in data governance, because there the data contract is linked to the team that is defining the data they want to share with the external world, so exactly like an API contract, by the way, and this then makes the governance, I think, much more practical and much more actionable, exactly like it happened with the API and API contract, because there it's not, you know, a top down approach where you need to set up the governance and then everyone needs to follow the rule, no, you have the team that is defining this data contract because they want to make sure that the other one that will consume the data will do it in a way that fits What they can provide to the external world. And there, of course, you have also that new way to manage data governance based on data contract, data contract catalog and all these things, which I think you see the value very fast, exactly again, like you see the value in the API contract that came after creation of this API the standardization of API contract via swagger file, and all these things today, it's a no brainer. Everyone is first writing an API contract to expose the service and the API to the external world because it makes sense. And then, by design, the governance is implemented through that type of thing, and that's, I think, for me, the future of data governance, and let's say applied or practical and bottom up, data governance is via this data contract and data product.
Punit 13:34
So, I understand what you're saying, but I'm not sure everyone who will be listening would be understanding in the same way. So, I ask you to clarify two things for people who would be new to these terms. One is you clarify the data mesh. Second is you clarify the contract. Because contract, from a legal standpoint, is something which is legally written from your standpoint, I think it's more likely a description of what I expect to give you, and you telling me if that fits your needs, it's more like the requirements in old days for software development. And the third dimension I like you to clarify. Where you clarify data mesh and contract is a term called Data Fabric, which is used interchangeably. Sometimes, yeah, if so you're okay with that?
Frederic 14:15
Yeah, Data Fabric. Maybe less able to explain. So, let's maybe focus on the two first one. I know the difference, but let's say, yeah, let me explain the first one and focus on two first one. So, the first one is data mesh. So, data mesh is all in all, it's like service mesh. So, in the service space, it's about having different data products being able to talk each other. So, a data product is, is a data a set, data set that is coming from a database, from an algorithm, whatever. So, then you have data that are exposed to the external words, and they are exposed through a data product. That's what is called a data product. Then in the data mesh, you have 12345, other data products that are consuming this data product, and this new data product typically will consume the data to be able to create an AI model. Creation of an AI model is also a data product, or applying an AI model on data is a data product, and the output is the insight coming from that, that that model. So, the data mesh is all the link, the mesh between all these different data products. So that's one thing for the contract. Yeah, you are right. Definitely not the legal contract, even if it can be used because certain typical data contract, and let's call it a technical data contract. You have SLA, so it could be also at some point in time, legal, but the data contract contains all the information that are useful to describe your data. So typically, you have the typical metadata, like the name, like location, like the format, like the model, but you have also more and more typical information, like the quality and the quality check. So, I don't know if you know soda, it's a Belgian based company that is providing data quality check and the typical language to define your quality check within the data contract, quality check being not only this field, should, should, should, should be a string or whatever. Though it's much more than that. It's about my data set should contain more than 1000 rows. So that type of thing are part of the data contract. And this data contract is, let's say, responsibility of the data holder. So, as a data holder, if I expose my data product. I engage myself or my team or whatever to comply with this data contract. So, if someone else in the data mesh, show someone else, meaning a data product, will consume my data, they can read the contract and see, ah, yeah, these data will comply with these rules. I can check with such a quality check, yeah, this data set comply with the rule that has been shared with the data holder, and then I can process the data. This is data contract.
Punit 17:08
Now, that makes sense. So, I think what I wanted to bring about is people should, especially some legal colleagues or privacy colleagues, should not start interpreting data contract as another legal document to be signed? No, it's not. It's more around explaining the expectations from both sides and saying, What do you expect the API or the data exchange to or the catalog to deliver towards each other, and knowing the way we are having the conversation. I think if we start going deep into data, we can be here for a week. But let's maybe get into another dimension of data. You mentioned earlier that the investments in data can be can go in any dimension, like you can go for data mesh, you can go for data lakes, you can go for data catalogs, you can go for data contracts. But it has to be situation based, and more importantly, when which, even if it's situation based, it requires a lot of investment. So there comes usually the question, I know the GDPR and EUA Act have made it a little bit easier, because both essentially ask you to have data governance in place, especially the UA act, but still tempted to ask, do you see a return on investment on these data investments, and how do you justify to your clients or to the businesses you advise to make investments into this area.
Frederic 14:15
Yeah, Data Fabric. Maybe less able to explain. So, let's maybe focus on the two first one. I know the difference, but let's say, yeah, let me explain the first one and focus on two first one. So, the first one is data mesh. So, data mesh is all in all, it's like service mesh. So, in the service space, it's about having different data products being able to talk each other. So, a data product is, is a data a set, data set that is coming from a database, from an algorithm, whatever. So, then you have data that are exposed to the external words, and they are exposed through a data product. That's what is called a data product. Then in the data mesh, you have 12345, other data products that are consuming this data product, and this new data product typically will consume the data to be able to create an AI model. Creation of an AI model is also a data product, or applying an AI model on data is a data product, and the output is the insight coming from that, that that model. So, the data mesh is all the link, the mesh between all these different data products. So that's one thing for the contract. Yeah, you are right. Definitely not the legal contract, even if it can be used because certain typical data contract, and let's call it a technical data contract. You have SLA, so it could be also at some point in time, legal, but the data contract contains all the information that are useful to describe your data. So typically, you have the typical metadata, like the name, like location, like the format, like the model, but you have also more and more typical information, like the quality and the quality check. So, I don't know if you know soda, it's a Belgian based company that is providing data quality check and the typical language to define your quality check within the data contract, quality check being not only this field, should, should, should, should be a string or whatever. Though it's much more than that. It's about my data set should contain more than 1000 rows. So that type of thing are part of the data contract. And this data contract is, let's say, responsibility of the data holder. So, as a data holder, if I expose my data product. I engage myself or my team or whatever to comply with this data contract. So, if someone else in the data mesh, show someone else, meaning a data product, will consume my data, they can read the contract and see, ah, yeah, these data will comply with these rules. I can check with such a quality check, yeah, this data set comply with the rule that has been shared with the data holder, and then I can process the data. This is data contract.
Punit 23:20
Fully with you. I think when we say data is gold, not all data is gold. The data that is relevant in that context is gold. But again, data, what you mentioned is can be looked at from different sites or different angles, and not every company is able to leverage data, but if you are able to leverage the right data in the right way. That's where the value is, and that's what we call, usually, in the wrong way, data monetization, with us leveraging the value of data, and that is being the hard part, but I think that's where probably your role comes in, inform of data village. Would you maybe get into that dimension also explaining what is data village, and how do you contribute in this journey of data governance, data management, data whatever?
Frederic 24:05
Yeah, with pleasure. So, so what we provide is, is a solution for companies to let them access data that they are not able to access today. So typically, high value data, sensitive data, proprietary data, usually the data they don't have in their own company, or data that they have in different subsidiaries within the same company or whatever, but outside of the boundaries that they are not able to get access to it, and the main reason is about trust, because the data holder does not want to share this data because it's too sensitive to COVID, and here one. So, we provide them that solution to get access and to use this data that they are not able to access today in a way that the data stays always confidential. So that's the value proposition. So, we make the solution bring the trust, and we call it. Trusted data collaborations to in a way that the company will be able to use these data, but without seeing the data, we often summarize what we do in three word sharing, without showing and everything is embedded. You know, things like, no, the data are not centralized, so all these key principles are fully embedded into solution. On top, of course, we use advanced technology like confidential computing, you know, advanced encryption and all these things. And also, of course, very important part of that is the governance, because in that type of relationship, everyone needs to know what are the data that are used for which purpose, and all these things. So, the governance is, of course, also something that that we provide, and with that approach, we open up a new opportunities, a new perspective within the same industry, but also cross industry. And that's where we think the future is. And we are not the only one, hopefully, because the Europe is also looking in that direction. You mentioned, GDPR, the AI Act, but you have also all the things around Data Act, Data Governance Act. So, all these things are also promoted by Europe to foster that. They call it data sharing, but to foster that, yeah, or in all data monetization, which I think I will say that Europe is is too much regulated. And I agree. But that perspective, if it flies, is very interesting, because in Europe, we are not good at creating these, you know, big player, not to mention the Google and and all this one, but also the new data platform, like snowflake, the data bricks and all this one. So, we are not good at that, meaning that all in all in Europe, it's not that we have one place where we have all the data for, you know, for advertising. I would say if you look in the US you have most of the data for advertising are at Google, at Facebook. I'm exaggerating a bit, but look at it in Europe. That's not the case. And that's more or less the case in all in all dimension and all the type of data, etc. And I think what's smart for from Europe is to think, yeah, but then, if we are not able to achieve that alone, let's foster that collaboration and sharing, and let's create dynamic where these data are shared together to bring that value and not in one single place and at data Village, we are from day one, strong believer in that. But of course, then you need to overcome all the challenge related to trust, because that type of thing will happen between, let's say, friends or partners, but also between competitor and that's we are working on typical project where competitor work together, but of course, Then where the competitor will let you know, their data being shared with another competitor, of course, not. So, the trust needs to be to be very high. So that's, that's what we are doing at data village.
Punit 28:11
Fully with you, that the EU strategy, in fact, facilitates data sharing, data collaboration, but in doing so, it has a lot of rules, a lot of laws, which make it feel like we have too many laws or rules to comply with, and we are heavily regulated. But if you pay attention, it actually makes your data sharing, data collaboration, very clear, this is what you have to do. This is what you cannot do, and that's where the regulation comes in. But that leads me to one question probably you may not like but I can ask you, you mentioned that you facilitate this data collaboration, and you say sharing without showing. So if you say sharing without showing, that means there is an access to data before the outcome is or the analysis is shared. So is there a legitimate basis that you rely upon in analyzing that data.
Frederic 29:01
So you mean legitimate basis, like it is mentioned under GDPR, that's what you mentioned, meaning you can process the data without the consent of the individual. Is it what you mean?
Punit 29:13
I mean both. So GDPR, or personal data? Yes, in context of GDPR, what's the basis? But if it's non personal data, sometimes also people bring in the ethics dimension, which is very broad and very generic. Without saying they just bring in ethics and complicate things. So what is your rational for processing such data? Because typically, if you take go to a company and say, We will analyze it, we will share some outcomes, and they are like, no, there is GDPR, and ethically we shouldn't do it. So how?
Frederic 29:41
No, no, no. Fine, no. So typically, we work with a lawyer company, Osmond Class, to make sure that the things we do are in line with the rule. So typically, GDPR and we are also covering that in a way, data. Never process if consent is not given. So that's, that's one thing, and then it takes is a bit more complex and difficult, of course, because it's also, can be also subjective, but, but, yeah, we take it into account and and we take the legal aspect into account, and the legal basis, typically in fraud detection and sharing, you know, sensitive data in fraud, there are things, it's, it's a legitimate basis, but, but there are things you cannot do. So it's not everything is not legitimate, even if to detect roads so. And there are, of course, some, some gray zone, less and less, but there are some gray zone and there, of course, we need, we need to align with lawyer and legal. It's not. Maybe a thing is also that important, is that we give a way to process the data, but the processing as such, and the purpose is defined by the company. So we don't, we don't say, Yeah, we do that processing, or that processing on the data village platform is the company was mentioning the company that needs to access the data that define the processing, and we provide the tool to let them define the processing, so the purpose and all these things, and then to connect the data and to make things transparent with the data holder. And if they need assistance on the legal aspect, which is usually not the case, because large organization are quite well equipped with DPU and so on, but they can.
Punit 31:26
No, I wanted to put the elephant in the room straight away, saying it's being done in a legitimate way. And you do look at the legal dimension and the legal applicability, it's not bypassing the laws. It's complying with the laws and finding out ways in which it can be leveraged, and if it's not, then so be it.
Frederic 31:45
Exactly, exactly, exactly, of course, exactly typical things. That is also interesting to look at is the notion of data intermediary in in the Data Governance Act, which are now entities and regulated entities that are there to facilitate that type of thing, but they have to comply with some rules. Typical one is they can facilitate this data exchange, but they cannot process the data for their own business. So that's one of the rule. But, of course, another rule is to make sure that the things they do comply with all the other regulation. So so no, it's not, it's definitely not to bypass stuff, etc. Of course, we bring technology that facilitates compliance but not overrule GDPR. Typically, it facilitates and it makes things compliant by design and secure by design, etc, but not overruling the regulation or whatever.
Punit 32:39
Good, So I think we have had a very good conversation, and I'm sure the listeners would have learned a lot. Now, is there any one final message you would like to pass while I also ask you to share. How can people reach out to you, connect with you if they want to know more or leverage this data village platform.
Frederic 32:58
There is our website, so it's https, www data village.ai, there is also email so they can, they can contact via contact at data village.ai. They can also reach us on LinkedIn. So we are on LinkedIn, of course. So yeah, these are the main channel, of course, that they can use. And yeah, just to say, always happy to exchange, also especially on on use case and opportunities that we see, but also about opportunities the market is seeing today around data collaboration.
Punit 33:36
So with that, Frederic, thank you so much for your time, inputs and insights. It was wonderful to have you.
Frederic 33:42
Thank you very much. Same here. Thank you.
Fit4Privacy 33:45
Thanks for listening. If you liked the show, feel free to share it with a friend and write a review if you have already done so. Thank you so much. And if you did not like the show, don't bother and forget about it. Take care and stay safe. Fit4privacy helps you to create a culture of privacy and manage risks by creating, defining and implementing a privacy strategy that includes delivering scenario based training for your staff. We also help those who are looking to get certified in CIPPE, CIPM and CIPT through on demand courses that help you prepare and practice for certification exam. Want to know more, visit www.Fit4privacy.com that's www. Fit the number four privacy.com if you have questions or suggestions, drop an email at hello(@)fit4privacy.com, until next time. Goodbye.