Jan 10 / Punit Bhatia

ChatGPT & Privacy Risks

Drag to resize

ChatGPT is a very popular and hot topic. Is there any risk in using it? Is it a big privacy risk?In this episode, Punit talks with Patricia, who is the Co-Founder & CEO of Private AI, a Microsoft-backed startup and discusses the risks, what makes ChatGPT different from other AI, and what companies can do to mitigate the risks. 

  • What comes to mind when it comes to GDPR
  • Views on ChatGPT
  • Does ChatGPT pose privacy risks?
  • How do companies mitigate the risks?
  • Private AI 

Transcript of the Conversation

Punit 00:00

Chat GPT, does it create a privacy risk? And does it create a privacy risk similar to other AI technologies? Well, that's an interesting question indeed. And Chat GPT is a hot thing everyone is talking about Chat GPT and AI. So why not talk about Chat GPT today can also understand a new technology that is in the market, from a person who is specialist in this. And the name is Patricia Thiane, who's also the CEO of a company called Private AI, and interesting, innovative company. So let's go and talk to her and learn about chat GPT the risks it creates, how can those risks be mitigated by companies and individuals? And what does privacy I do?

Punit 01:21
So here we are with Patricia. Welcome to Fit4Privacy podcast.

Patricia 01:26
Thank you. It's such a pleasure to be here.

Punit 01:28 It's a pleasure indeed. And let's start with a quick question. When you think of the GDPR what's the one word that comes to your mind?

Patricia 01:37

Punit 01:39
Revolutionary? That's very true. You want to elaborate on that?

Patricia 01:44
Absolutely. What in multiple ways, it's revolutionary one way in particular, it made a bunch of technology that was absolutely necessary be created in order for compliance to be possible. And there's still even more technology in the works to make it possible. The second one is companies had to take stock of what kind of data they actually had. And basically, many revolutions internally and anybody that took it that took it seriously because they needed to recreate their data, understanding data processing processes, and that made complete chaos in certain organizations turn into order.

Punit 02:31
That's very true. And one of the technologies that's creating a bit of chaos these days or conversation these days is Chat GPT. You heard of that as well. Right?

Patricia 02:44

Punit 02:46
What a stupid question, but sometimes it allows you to connect. And what are your views on Chat GPT, then because the a lot of use in the market positive, negative, scary and all that?

Patricia 02:58
Yeah, I'm the technology to make touchy PT possible has been around for a few years now. It's nice that now with the user interface, people have more access to that technology worldwide. Is it scary? I think it really depends what ends up what it ends up being used for. I think we're having very good conversations around responsible AI. I wouldn't say the technology itself for Chachi PT is particularly scary compared to other technologies that are out there. But it? Yeah, it really depends how you use it. Some some concerns might be, for example, creating malicious code with it or, you know, tricking people into believing certain information. But that has already been happening without the technology as well. This does make it a little bit more efficient.

Punit 03:57
Absolutely. technology by itself is agnostic is what we make the technology to that creates the risk. But here we are also talking about technology, learning it by itself, and then being more independent. But more importantly, the question is, does Chuck GPT create a privacy risk? And how does it create a privacy risk?

Patricia 04:20

Sure, it does, in a few ways. So you it's not necessarily just Chat GPT itself, but any large language model that you're sending conversational data to? When you're having a conversation with a chatbot. You might not be thinking about what kind of information you're including. And that can include things like passwords and usernames that could include things like your address, but it could also include confidential information from a corporation. And I think they're really positive steps being taken to reduce the privacy risk. Have these various large language models when you're sending your data to them. However, both consumers and companies have to be very aware of what kind of information they're sending to what kind of system? And that's pretty complicated. Because if you have a system that says they're deleting your data once, once it's sent to them, for example, do they have the right security in place, you as a consumer don't really have much insight into that you can go and ask them for their cybersecurity practices. So it can it can get quite tricky to figure out the privacy of these.

Punit 05:40

It's absolutely true. But as you said, Chat GPT is part of it. And there are other technologies. So let's maybe if we look at it a little bit more broadly, let's say the AI technology as a broad sense of AI, again, is quite complex in just to say just AI. But are the risks of similar to what are in context of AI or in context of champ GPT. The risks are different.

Patricia 06:10
Yeah, that's a good question. So I think the risks are fair, it it really depends on because AI can encompass so many different things, it could be an AI can introduce physical risk as well, right? If if you're talking about drones, or if you're talking about surveillance on the street, automatic surveillance on the street, that's that's also a different kind of risk than charge GPT risk. The kind of risk that you might be concerned about with Chat GPT are one, when you're no one, what data was used to train the models, and was any of the data that you may not have exclusively, given the rights to a company to use for this purpose, was it used in the training data, and there is currently a lawsuit against opening I regarding the copyright of the information that they train. That, however, if they didn't train on, for example, citizen EU citizens data, without their consent for this purpose that does raises raise concerns around the GDPR. And hopefully, they put some constraints in place when they were training that removed the personal information before doing so. The second concern is if you're fine tuning the model, for a particular purpose, or within an organization, for example, with customer data, you do have to be concerned about what kind of access controls you have around the model, that if you are trading it with personal data, so the same access controls that you had with the original data have to be in place for the model that was trained on that data, because it can spew out that personal information in production. And then the other privacy concern is when you are sending your data to a third party period, there's that's the same privacy concern that you'd have for any scenario where you're sending the same kind of information you're sending to judge.

Punit 08:12
And I think the biggest concern from an individual perspective is most people when we in privacy are usually more aware of what's the risk, what's happening, who is getting the data from a layman perspective. Chat GPT or any AI technology is so fascinating, that they tend to use it and they don't realize what's the complexity and the risk behind it. For example, just if it stick says zoom or teams or anything now these are offering and many other websites are offering let's transcribe your meeting. What they do is they'll find the link Okay, put it and Patricia are in that meeting and they are talking and they then what model Punit Twice is this Patricia's Voice If This Then I'm with somebody you are with someone else. And then they keep that record and over time, they are able to identify what you and then what they do is they profile us profile us, what's our background? What are what's our ethnicity? How do we speak and then they make models saying this people coming from this small kind of background speak like this and then the train the AI to become more efficient and more accurate. Now from perspective of improving that AI technology to do better transcription, that's good. But from the perspective of generalizing, that's also okay. But if you're going to use it the next time I'm on another device and you're sharing the data, and they recognize its opponent, or Patricia, that's not okay. And people don't realize because they see most businessmen who are intrapreneurs and who have to do the transcription is are nice. I go to this app and I new to the month. I can get as many transcriptions as possible. All my minutes are transcribed. All my meetings are transcribed. I have recorded evidence, written evidence, but that's where the risk is.

Patricia 10:09
Absolutely. Yeah, the data sharing aspect of it from whether you're sharing models or whether you're sharing the raw data is definitely a concern.

Punit 10:18
Yeah. And then if we go technical or contractual, where's that data going? To who, to which country? And that it gets complex, but maybe let's not scare people with all these risks? Because everyone, at least in our fields, knows it. The question is, there are two, two broad entities who are impacted. One is the companies and they need to mitigate those risks. And the second is the individual. So let's take them one by one and see how can companies and individuals mitigate risk and starting with companies, if I'm a company operating in any field, any industry? And, of course, I would love to use AI. I would love to use GPT, because it makes life simple. Nothing against technology. But how do I mitigate as a company? Or What measures do I put to mitigate these risks?

Patricia 11:11
Yeah, there, there are a few, it really depends on of course, which models you're using, what services you're using, one, make sure that the services that you are using either deploy within your environment or deploy within a secure infrastructure to if they are if even if they are deployed either within your environment or within a secure infrastructure, it's still very good to limit the amount of personal information that you are sharing with the service, because you want to limit the the the extent to which that information is shared within the organization period. So there are methods for removing the personal information and still making these services very useful. And then, in addition to that, make sure that you know, the security is in place, there's no privacy without security, make sure that there are there are frequent security checks to make sure that your your your scanners, your scanner for the services are a good cybersecurity scanner. And make sure that your your employees are properly trained on the best ways to use AI. And I think there's still a big question mark around how to measure risk. There's a huge conversation going on about what kind of risk should we even consider for these systems. And we're just at the beginning of what that's going to look like. And eventually that'll likely make its way towards legislation. But we can look at information from or we can look at regulations from automated decision systems from previous legislations that might say, make sure that you are maintaining the right date, true data. That is actually reflective of reality. Make sure that you give the users a right to to view what kind of data you're storing on them and correct any information that they have. That kind of those kinds of indications can give guidance as to what organizations can do as well. And then as an individual, let's look at the settings. If you can always see if you can remove the right for sharing your data with a third party, always see if there's, they're storing your data within their servers for how long they're storing the data for what purposes. And one, good, one good way of doing it is going to their privacy policy and control effing for data and seeing what shows up. You don't have to read through the entire thing.

Punit 13:53
I think very well said, you said almost the entire GDPR look where it is deployed from a company perspective, limit the personal information, limit the purposes, make sure it's secure, train your staff and monitor the risk. I think that's what we say in the GDPR. That must be done. And that's what is also essential for being responsible or accountable company whether it's responsible in AI responsible in privacy or ethical. And from users perspective, you're saying go out, read the privacy notice. Get a sense of what they are doing. And also look at the settings because a lot of times settings give you means to have some sort of control maybe not as much as we want, but enough control. And with all this, where does this your company private AI fit in?

Patricia 14:46
Yeah, so in a few ways. What we do at private AI is identify and remove personal information across multiple types of data, text, audio, images, documents, and doing so across 40 United font languages. So it allows you to allows an organization to reduce risk and also identify risk in the first place. And where we fit in is in reducing the, for example, the amount of personal information that's going to data science teams or crossing the boundaries within an organization or before it reaches a third party API. Or there's so many different use cases also for trading machine learning models without the personal information being memorized by those models. And then on the flip side, a lot of the times when you'd want to make a privacy assessment, when you want to identify what the damage was, from a cybersecurity breach, you need something really accurate to do so. And you need something that's multimodal, and we have basically the best technology in the world to do that. On the consumer side, we do have also a private GPT that we recently launched, so it allows you to communicate with chatty PT while preserving user privacy.

Punit 16:06
So as the name said, it's basically the private artificial intelligence for the company. Sorry, is that the right way of saying it private AI for the company?

Patricia 16:21
Yes. But that's right. So we we enable private? Well, our name is private, and we enable private privacy preserving AI as well.

Punit 16:31
Nice, nice. And do you also do then, because if you do the identity reduction and replacement, and then you make the PII which is pseudonymized, or anonymize or reduce the risk by D identifying it? Is that Yeah,

Patricia 16:49
all of the above. That's right. So you can, the interesting thing is that it's when you think about the kind of data that you're sharing, it really depends on the type of data to with regards to the extent that you might want to de identify, so you might want to completely anonymize the data, if it's medical information, for example. But if it is you having a customer service call with someone about a vacuum cleaner your purchase, you might not really care if somebody knows who what your identity is, but you will definitely care if you're sharing your credit card number, for example. So you at that point, your the identification can be redaction of the key elements that might get you into trouble through identity theft, for example. So I think that's something that data protection regulations don't always take into account, which is complex and interesting to think about.

Punit 17:53
And one of the use cases that we are always talking about positive use case of AI, let's say I have a production environment, in that I will have a lot of customer data. And that's sensitive, and that I should not use for other purposes. And now I want to do testing, or acceptance of new environments, new products, which I'm going to lose launch. And then if I use a dump of which has been the case for last 30 years or so on in it, to use that data. And now we are having this new concept of synthetic data, that is you artificially generate data based on the data set in production. So essentially, you replace Patricia with Jennifer and pull it with Tom, and so on and so forth, so that you make data completely unrecognizable, and even some of the other elements in a systematic way. So the privateer also does that the generation of synthetic data,

Patricia 18:50
we do generation of synthetic ti specifically, for fully synthetic data. There are some very good companies that do this. The the way to think about it is if you don't have enough data for a particular use case, you would like to, you generally want to create a fully synthetic data to complement that data set. What we find is that a lot of the times companies have enough data, they want to take advantage of the full extent of the context of that data, for example, it might be customer service calls about a particular product, or it might be chats with their own customers or anything like that. They might have a ton of that data. And they what they want to do is reduce the risk of sharing that data with their teams. And that's where synthetic PII makes sense. So they're complementary solutions for different problems.

Punit 19:43
Sure. And when I was going to your website, I read that you're offering your product or solution for free for non non profit organizations. Is that true?

Patricia 19:54
Yes, yes, that is true. As long as it's for as long as they're not selling it. And that's for their own their own nonprofit purposes. That is true. We, we do offer that for research purposes for helping the nonprofit accomplish various goals like data acquisition. For, for example, D identifying sources, if you're a journalist, there, there are lots of good use cases in the nonprofit world.

Punit 20:26
Sure. Now, that's a very good thing. We also offer training to not for profit, usually at a very reduced cost. So that's a good thing to contribute to the society. Now, if based on this conversation, someone would say, Let's contact Patricia, you want to know more about private AI? What's the best way

Patricia 20:51
to know more about private AI, I'd recommend going on the website, it's private hyphen, ai.com. To contact me, please do so at private at Patricia at private hyperdia.com or contact me on LinkedIn or on Twitter, at private NLP.

Punit 21:09
That's good. It was wonderful to have you can have this interesting conversation about privacy, AI GPT, and the AI world. So thank you so much for your time, and it was wonderful to have you. 

Patricia 21:23
Thank you so much for the invitation.


Patricia Thaine is the Co-Founder & CEO of Private AI, a Microsoft-backed startup. With a decade of research and software development experience, she is a Computer Science PhD Candidate at the University of Toronto and a Vector Institute alumna. She founded Private AI to help companies unlock the value of unstructured data while maintaining customer privacy and compliance. Its latest launch, PrivateGPT, serves as a privacy layer for ChatGPT, redacting sensitive information from your prompts before sending them through the chatbot. 


About Punit Bhatia

Punit Bhatia is one of the leading privacy experts who helps CXOs and DPOs to identify and manage privacy risks by creating a privacy strategy and implementing it through setting and managing your privacy program and providing scenario based training to your key staff.  In a world that is digital, AI-driven, and has data in the cloud, Punit helps you to create a culture of privacy by establishing a privacy network and training your company's management and staff. 
For more information, please click here.

Listen to the top ranked EU GDPR based privacy podcast...

Stay connected with the views of leading data privacy professionals and business leaders in today's world on a broad range of topics like setting global privacy programs for private sector companies, role of Data Protection Officer (DPO), EU Representative role, Data Protection Impact Assessments (DPIA), Records of Processing Activity (ROPA), security of personal information, data security, personal security, privacy and security overlaps, prevention of personal data breaches, reporting a data breach, securing data transfers, privacy shield invalidation, new Standard Contractual Clauses (SCCs), guidelines from European Commission and other bodies like European Data Protection Board (EDPB), implementing regulations and laws (like EU General Data Protection Regulation or GDPR, California's Consumer Privacy Act or CCPA, Canada's Personal Information Protection and Electronic Documents Act or PIPEDA, China's Personal Information Protection Law or PIPL, India's Personal Data Protection Bill or PDPB), different types of solutions, even new laws and legal framework(s) to comply with a privacy law and much more.
Created with