The early days of Youtube were filled with cute cat videos. On the other hand, just within its first trimester, ChatGPT has aced the medical exams, and got an MBA. Thankfully, it failed the accounting exam proving once and for all that balancing the books is a fiction even a fairly advanced AI model has a hard time comprehending (exhibit A: FTX).
On a more serious note, the possibilities are immense with a conversational AI model. But ChatGPT is not just a conversation AI model, i.e. your friendly chat bot that you see on websites these days. It is much more than that. And its expansive capabilities make me wonder about the privacy implications associated thereof.
DATA | INFORMATION | ASSET
To fully appreciate the privacy implications, first let’s understand the structure in which data, any data, increases in its utility. In short, the utility increases in step functions as data gets processed and synthesized into contextual information. Furthermore, when different types of information are put together (an activity requiring some level of intelligence), the output is an asset. Financial Statements, Quarterly Business Reports, Medical Diagnosis Reports, Business Cases, Architecture Documents etc. are all examples of assets — they contain data, and information put together in a context that makes the whole greater than the sum of its parts (individual pieces of data and information).
We google for information on particular topics all the time. E.g. If I am looking to know the number of commercial airports in the US, that’s something Google can help me with. However, if I want to prepare a letter to my congressman requesting them to reconsider location of an airport for civic reasons, all google can help me with is perhaps similar letters written in the past. ChatGPT on the other hand can actually write a letter for me. There is an actual asset that gets produced that can then be sent to the Congressman. Put another way, while google search gives you all the ingredients to prepare your email, ChatGPT gives you the meal. No wonder there are many startups figuring out innovative ways to use ChatGPT to provide a service. The question though is when building a service over ChatGPT, are you responsible for the results?
Of course, the purist point of view on privacy can be that anything that is actually on the open internet is public by definition. Hence, a search engine or a gigantic language model using that data to present information should not be considered a violation of privacy. And from a purist point of view, they may be correct. However, and this is applicable fairly widely, it is not a resource itself but the usage of that resource that creates externalities. Case in point — fossil fuels are a resource. They were inside the earth for millenia, and had little impact on the environment. The moment we started using them though (powering much of the modern development), well that’s when the externalities started appearing. And the externalities are no fun.
“Hey ChatGPT, can you prepare an article on the externalities of using fossil fuels”
“Well, sorry, I am running out of capacity now. But let me google the externalities for you.”
Therein lies the stark difference between using Google Search and ChatGPT. One is sitting on top of a resource, and does its best to answer questions you pose. It is your grandmother — you go ask a question, and she goes, “well, how much time do you have to understand what you may or may not be asking”. The other is an overzealous consultant. Its objective is not just to give you information, but to provide you with an asset that you can directly use. It’s like saying “Alexa, create my quarterly business report using data from Salesforce.” Lo and behold, the quarterly report is here, ready to be sent to your boss.
ChatGPT (or similar avatars of Large Language Models) is that powerful. But wait, what about the pesky little thing called privacy? How do you consider the privacy implications of a few popular use cases of ChatGPT? Let’s look at two of the use cases:
Fireside Q&A with ChatGPT
ChatGPT Powering New Apps
FIRESIDE CHAT WITH ChatGPT: And The Pesky Privacy Predicament
Is there a chance a conversational AI model that appears semi-sentient to regular people interacting with it start tromping on privacy? Hmmm, let’s consider a totally innocuous scenario. Imagine an organization’s data has been breached, and as part of the breach, personal data belonging to millions of people got stolen. Actually, sorry, you don’t have to imagine this at all — these happen at regular intervals. In fact, there are organizations that have exposed their customers’ sensitive data not once or twice but almost as an annual ritual.
OK, that rant aside, for a second let’s think about what happens to this breached data. It gets sold to as many bidders as possible. Just like software, every buyer gets a non-exclusive right to use that personal data. Inevitably over a period of time some or all of that data becomes available on the web. This is how many password applications nudge you to change your passwords with warnings like “this password may have been compromised as part of a data breach.”
Now, our ChatGPT, being the knower of all knowledge there is, comes across this treasure trove of information and happily/greedily learns from it (hey, it’s AI after all). The question is what kind of controls does it have in place to sift through and remove any personal information from the content it is learning from? It is not a trivial question because you do want the model to learn about famous people (Will Smith?) but not about regular individuals (Greg Smith?). Even for Will Smith, you probably don’t want ChatGPT to learn about his physical addresses. The powers at Open AI most certainly probably have tweaked their algorithms to address this issue for sure most likely… It’s just that AI models have this ability to self-learn (else they become stale very fast). And so if you do too good of a job controlling exactly which sources an AI model that has access to the internet is learning from, it might just defeat the whole purpose of a large language model.
So where does that leave us? Well, for one, you and I have no way of knowing the sources that AI powered models are learning from. So, we may have to assume that some of its output may contain information that is privately public (“your leaked SSN that you don’t know about, and shouldn’t be public”). Which means you need a way to scan all ChatGPT output for PII/protected/sensitive information, either manually or using another AI model (cough, cough — www.lightbeam.ai — OK, this is my only plug, I promise).
ChatGPT POWERING NEW APPS: And The Pesky Privacy Predicament
The next question is the implication of using ChatGPT to build new services and applications. I must say that I am very excited about the possibilities ChatGPT might unleash in the entrepreneur community to build innovative solutions. It can truly put ChatGPT at a level of Linux — something that can spawn millions of useful applications.
However here too, remember that ChatGPT is not merely giving you the answers, or the information. It is giving you the ready to consume asset. You don’t use ChatGPT just to get a few examples of an email you send to your team on great last quarter and the path forward. You literally get your email written ready to be sent. As applications leverage ChatGPT to automate the mundane, and aid the creative process, will they step back on every ChatGPT response and review the responses for any non-public private infrastructure, or information that was not supposed to be in public at least?
My guess is as good as yours (which is basically that it might remain the wild west until the first flurry of lawsuits). After that, I suspect ChatGPT to have a strict privacy mode for its responses that would be less fun, and less informational. For the bravehearts, they can have the normal “I don’t give a damn” mode, at least for private/research use (hopefully commercial use will always be the privacy mode).
Onward the privacy bravehearts. We have no idea where the ship is headed. All we can expect is an exciting journey (and hopefully less paperwork/form filling leveraging some AI powered automation).