ChatGPT is a very popular and hot topic. Is there any risk in using it? Is it a big privacy risk?In this episode, Punit talks with Patricia, who is the Co-Founder & CEO of Private AI, a Microsoft-backed startup and discusses the risks, what makes ChatGPT different from other AI, and what companies can do to mitigate the risks.
- What comes to mind when it comes to GDPR
- Views on ChatGPT
- Does ChatGPT pose privacy risks?
- How do companies mitigate the risks?
- Private AI
Transcript of the Conversation
Chat GPT, does it create a privacy risk? And does it create a privacy risk similar to other AI technologies? Well, that's an interesting question indeed. And Chat GPT is a hot thing everyone is talking about Chat GPT and AI. So why not talk about Chat GPT today can also understand a new technology that is in the market, from a person who is specialist in this. And the name is Patricia Thiane, who's also the CEO of a company called Private AI, and interesting, innovative company. So let's go and talk to her and learn about chat GPT the risks it creates, how can those risks be mitigated by companies and individuals? And what does privacy I do?
So here we are with Patricia. Welcome to Fit4Privacy podcast.
Thank you. It's such a pleasure to be here.
Punit 01:28 It's a pleasure indeed. And let's start with a quick question. When you think of the GDPR what's the one word that comes to your mind?
Revolutionary? That's very true. You want to elaborate on that?
Absolutely. What in multiple ways, it's revolutionary one way in particular, it made a bunch of technology that was absolutely necessary be created in order for compliance to be possible. And there's still even more technology in the works to make it possible. The second one is companies had to take stock of what kind of data they actually had. And basically, many revolutions internally and anybody that took it that took it seriously because they needed to recreate their data, understanding data processing processes, and that made complete chaos in certain organizations turn into order.
That's very true. And one of the technologies that's creating a bit of chaos these days or conversation these days is Chat GPT. You heard of that as well. Right?
What a stupid question, but sometimes it allows you to connect. And what are your views on Chat GPT, then because the a lot of use in the market positive, negative, scary and all that?
Yeah, I'm the technology to make touchy PT possible has been around for a few years now. It's nice that now with the user interface, people have more access to that technology worldwide. Is it scary? I think it really depends what ends up what it ends up being used for. I think we're having very good conversations around responsible AI. I wouldn't say the technology itself for Chachi PT is particularly scary compared to other technologies that are out there. But it? Yeah, it really depends how you use it. Some some concerns might be, for example, creating malicious code with it or, you know, tricking people into believing certain information. But that has already been happening without the technology as well. This does make it a little bit more efficient.
Absolutely. technology by itself is agnostic is what we make the technology to that creates the risk. But here we are also talking about technology, learning it by itself, and then being more independent. But more importantly, the question is, does Chuck GPT create a privacy risk? And how does it create a privacy risk?
Sure, it does, in a few ways. So you it's not necessarily just Chat GPT itself, but any large language model that you're sending conversational data to? When you're having a conversation with a chatbot. You might not be thinking about what kind of information you're including. And that can include things like passwords and usernames that could include things like your address, but it could also include confidential information from a corporation. And I think they're really positive steps being taken to reduce the privacy risk. Have these various large language models when you're sending your data to them. However, both consumers and companies have to be very aware of what kind of information they're sending to what kind of system? And that's pretty complicated. Because if you have a system that says they're deleting your data once, once it's sent to them, for example, do they have the right security in place, you as a consumer don't really have much insight into that you can go and ask them for their cybersecurity practices. So it can it can get quite tricky to figure out the privacy of these.
It's absolutely true. But as you said, Chat GPT is part of it. And there are other technologies. So let's maybe if we look at it a little bit more broadly, let's say the AI technology as a broad sense of AI, again, is quite complex in just to say just AI. But are the risks of similar to what are in context of AI or in context of champ GPT. The risks are different.
Yeah, that's a good question. So I think the risks are fair, it it really depends on because AI can encompass so many different things, it could be an AI can introduce physical risk as well, right? If if you're talking about drones, or if you're talking about surveillance on the street, automatic surveillance on the street, that's that's also a different kind of risk than charge GPT risk. The kind of risk that you might be concerned about with Chat GPT are one, when you're no one, what data was used to train the models, and was any of the data that you may not have exclusively, given the rights to a company to use for this purpose, was it used in the training data, and there is currently a lawsuit against opening I regarding the copyright of the information that they train. That, however, if they didn't train on, for example, citizen EU citizens data, without their consent for this purpose that does raises raise concerns around the GDPR. And hopefully, they put some constraints in place when they were training that removed the personal information before doing so. The second concern is if you're fine tuning the model, for a particular purpose, or within an organization, for example, with customer data, you do have to be concerned about what kind of access controls you have around the model, that if you are trading it with personal data, so the same access controls that you had with the original data have to be in place for the model that was trained on that data, because it can spew out that personal information in production. And then the other privacy concern is when you are sending your data to a third party period, there's that's the same privacy concern that you'd have for any scenario where you're sending the same kind of information you're sending to judge.
And I think the biggest concern from an individual perspective is most people when we in privacy are usually more aware of what's the risk, what's happening, who is getting the data from a layman perspective. Chat GPT or any AI technology is so fascinating, that they tend to use it and they don't realize what's the complexity and the risk behind it. For example, just if it stick says zoom or teams or anything now these are offering and many other websites are offering let's transcribe your meeting. What they do is they'll find the link Okay, put it and Patricia are in that meeting and they are talking and they then what model Punit Twice is this Patricia's Voice If This Then I'm with somebody you are with someone else. And then they keep that record and over time, they are able to identify what you and then what they do is they profile us profile us, what's our background? What are what's our ethnicity? How do we speak and then they make models saying this people coming from this small kind of background speak like this and then the train the AI to become more efficient and more accurate. Now from perspective of improving that AI technology to do better transcription, that's good. But from the perspective of generalizing, that's also okay. But if you're going to use it the next time I'm on another device and you're sharing the data, and they recognize its opponent, or Patricia, that's not okay. And people don't realize because they see most businessmen who are intrapreneurs and who have to do the transcription is are nice. I go to this app and I new to the month. I can get as many transcriptions as possible. All my minutes are transcribed. All my meetings are transcribed. I have recorded evidence, written evidence, but that's where the risk is.
Absolutely. Yeah, the data sharing aspect of it from whether you're sharing models or whether you're sharing the raw data is definitely a concern.
Yeah. And then if we go technical or contractual, where's that data going? To who, to which country? And that it gets complex, but maybe let's not scare people with all these risks? Because everyone, at least in our fields, knows it. The question is, there are two, two broad entities who are impacted. One is the companies and they need to mitigate those risks. And the second is the individual. So let's take them one by one and see how can companies and individuals mitigate risk and starting with companies, if I'm a company operating in any field, any industry? And, of course, I would love to use AI. I would love to use GPT, because it makes life simple. Nothing against technology. But how do I mitigate as a company? Or What measures do I put to mitigate these risks?
I think very well said, you said almost the entire GDPR look where it is deployed from a company perspective, limit the personal information, limit the purposes, make sure it's secure, train your staff and monitor the risk. I think that's what we say in the GDPR. That must be done. And that's what is also essential for being responsible or accountable company whether it's responsible in AI responsible in privacy or ethical. And from users perspective, you're saying go out, read the privacy notice. Get a sense of what they are doing. And also look at the settings because a lot of times settings give you means to have some sort of control maybe not as much as we want, but enough control. And with all this, where does this your company private AI fit in?
Yeah, so in a few ways. What we do at private AI is identify and remove personal information across multiple types of data, text, audio, images, documents, and doing so across 40 United font languages. So it allows you to allows an organization to reduce risk and also identify risk in the first place. And where we fit in is in reducing the, for example, the amount of personal information that's going to data science teams or crossing the boundaries within an organization or before it reaches a third party API. Or there's so many different use cases also for trading machine learning models without the personal information being memorized by those models. And then on the flip side, a lot of the times when you'd want to make a privacy assessment, when you want to identify what the damage was, from a cybersecurity breach, you need something really accurate to do so. And you need something that's multimodal, and we have basically the best technology in the world to do that. On the consumer side, we do have also a private GPT that we recently launched, so it allows you to communicate with chatty PT while preserving user privacy.
So as the name said, it's basically the private artificial intelligence for the company. Sorry, is that the right way of saying it private AI for the company?
Yes. But that's right. So we we enable private? Well, our name is private, and we enable private privacy preserving AI as well.
Nice, nice. And do you also do then, because if you do the identity reduction and replacement, and then you make the PII which is pseudonymized, or anonymize or reduce the risk by D identifying it? Is that Yeah,
all of the above. That's right. So you can, the interesting thing is that it's when you think about the kind of data that you're sharing, it really depends on the type of data to with regards to the extent that you might want to de identify, so you might want to completely anonymize the data, if it's medical information, for example. But if it is you having a customer service call with someone about a vacuum cleaner your purchase, you might not really care if somebody knows who what your identity is, but you will definitely care if you're sharing your credit card number, for example. So you at that point, your the identification can be redaction of the key elements that might get you into trouble through identity theft, for example. So I think that's something that data protection regulations don't always take into account, which is complex and interesting to think about.
And one of the use cases that we are always talking about positive use case of AI, let's say I have a production environment, in that I will have a lot of customer data. And that's sensitive, and that I should not use for other purposes. And now I want to do testing, or acceptance of new environments, new products, which I'm going to lose launch. And then if I use a dump of which has been the case for last 30 years or so on in it, to use that data. And now we are having this new concept of synthetic data, that is you artificially generate data based on the data set in production. So essentially, you replace Patricia with Jennifer and pull it with Tom, and so on and so forth, so that you make data completely unrecognizable, and even some of the other elements in a systematic way. So the privateer also does that the generation of synthetic data,
we do generation of synthetic ti specifically, for fully synthetic data. There are some very good companies that do this. The the way to think about it is if you don't have enough data for a particular use case, you would like to, you generally want to create a fully synthetic data to complement that data set. What we find is that a lot of the times companies have enough data, they want to take advantage of the full extent of the context of that data, for example, it might be customer service calls about a particular product, or it might be chats with their own customers or anything like that. They might have a ton of that data. And they what they want to do is reduce the risk of sharing that data with their teams. And that's where synthetic PII makes sense. So they're complementary solutions for different problems.
Sure. And when I was going to your website, I read that you're offering your product or solution for free for non non profit organizations. Is that true?
Yes, yes, that is true. As long as it's for as long as they're not selling it. And that's for their own their own nonprofit purposes. That is true. We, we do offer that for research purposes for helping the nonprofit accomplish various goals like data acquisition. For, for example, D identifying sources, if you're a journalist, there, there are lots of good use cases in the nonprofit world.
Sure. Now, that's a very good thing. We also offer training to not for profit, usually at a very reduced cost. So that's a good thing to contribute to the society. Now, if based on this conversation, someone would say, Let's contact Patricia, you want to know more about private AI? What's the best way
to know more about private AI, I'd recommend going on the website, it's private hyphen, ai.com. To contact me, please do so at private at Patricia at private hyperdia.com or contact me on LinkedIn or on Twitter, at private NLP.
That's good. It was wonderful to have you can have this interesting conversation about privacy, AI GPT, and the AI world. So thank you so much for your time, and it was wonderful to have you.
Thank you so much for the invitation.
ABOUT THE GUEST
Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup. With a decade of research and software development experience, she is a Computer Science PhD Candidate at the University of Toronto and a Vector Institute alumna. She founded Private AI to help companies unlock the value of unstructured data while maintaining customer privacy and compliance. Its latest launch, PrivateGPT, serves as a privacy layer for ChatGPT, redacting sensitive information from your prompts before sending them through the chatbot.
About Punit Bhatia
For more information, please click here.