Invisible Technologies Case Study - Full Video

DSPM on Google Cloud: Hypergrowth AI Company Ensures Data Security at Scale

Invisible Technologies, a fast-growing AI company, prioritized data security by implementing Lightbeam in their Google Cloud environment to gain real-time visibility into sensitive data. The solution improved accuracy in data classification and governance while streamlining privacy operations and enhancing AI model training controls.

Transcript

Hi, I'm Patrick McKinney. I'm the head of security at Invisible Technologies, and
we make AI work through a variety of
human-in-the-loop and AI platforms. Yeah.
So as part of a full mature security program, you're going to want to implement
robust data security and privacy controls.
And coming into Invisible, we looked to upgrade across the board, and privacy and
data security was very manual. So we were looking at tooling to come in and
identify sensitive data and how we can automate remediations and privacy
controls across the board. And so Lightbeam fit the mold and checked a lot of those
boxes that we were looking for in our RFP, and they eventually won our business.
Before we implemented Lightbeam, we would get DSAR requests in manually through
different email streams that would come to me through forwarded emails.
And obviously an email inbox is not the best way to organize that.
So having Lightbeam and creating a centralized point of intake for
those DSARs and then allowing the data scanning and
classification engine that the Lightbeam has to be the heavy
lifting for those DSARs to delete data as requested by the customers, that made it
way faster for us, way easier, and it took down the level of effort by about 80,
90% for us. Yeah, the importance of data security at Invisible is around
protecting IP. It's around ensuring that as we're using AI, we're using it
responsibly and making sure that data that doesn't need to go into the AI models or
be trained on is in fact not trained on and doesn't go into those models.
So being able to identify the types of data that we want to keep segregated from
those is very important, and Lightbeam gives us that ability to not only scan
the data that we have in different areas, but classify it and categorize it and
label it so we can ensure that that data doesn't go into areas it doesn't need
to. So it's a mix of protecting IP, but also ensuring that we're doing
the right practices when it comes to AI and acting in a responsible way.
Some of the important areas that we were looking for when baking off different
products in this area is, one, making sure that our data didn't leave our
environment. And so the way that Lightbeam is structured and way that it's
architected is our data stays in our environment while it's being scanned and
categorized and labeled. The other thing was ease of use.
I know these DSPM tools can be kind of clunky sometimes, and Lightbeam has a very
easy UX/UI, segregating out where the privacy ops portion is versus the
policies to help with proactively blocking potential data security
issues. And then integrations, which Lightbeam out of the box met all the
integrations that we had and we needed for our environment.
And the team is taking any feedback that we give them very responsible.
So making sure that the company that we bring in to help with this job is not just
a tool, but it's also a partnership.
And so I think Lightbeam checked those off very easily without trying to pull teeth
out of a lion. The data sources that we wanted were sources that most
companies use, Slack. We had a structured data store, and then
we had Google Workspace, which is email and the Google Drive area, which
unstructured data is one of those things in the DSPM space.
It's not easy to get right all the time.
With structured data in a database, it's very easy, it's very binary, it's very
small, very easy to scan. With unstructured data, PDFs, and all that sort of stuff,
the accuracy in which Lightbeam was interacting with the Google workspace and being
able to scan the plethora of data that we have in there, it was actually smooth,
and we were checking for accuracy, and the accuracy post-scan, post-labeling was
dead on. So yeah, it was very good. Yeah.
So foundational data discovery and having that full catalog of
data is paramount to a privacy operations program becoming
successful, as well as the access insights that Lightbeam adds to the data that
we're already discovering is important so that we can practice the principle of
least privilege across all of our environment, making sure that customer data is
only seen by those that need it and processed by those that need to have access to
it, rather than allowing everybody in the company just to have unfettered access to
things like Google Workspace, where those PDFs and those documents can be seen by
anybody. Yeah, I think a lot of these AI companies are growing very fast, and the
growth to securing is always a ratio that you want to be right on the money
with. And I think you've seen some articles come out in the last couple of years
about companies not practicing proper data security, or maybe it isn't their top
priority, or it just slips their mind because there's a lot of priorities in a
security team. So making sure that we have the tools basically
telling us where all of our sensitive data is so we can lock it down and make sure
it stays within our boundaries, that's very important to us, so we don't have any
sort of reputational degradation when it comes to that.
Nobody wants to be in an article negatively, so not all press is good press. Yeah.
So we had a very detailed RFP with a weighted scorecard, and we baked off four
different competitors. And what it came down to for us was what I call right fit,
which is a mix of how expensive is the tool versus the ROI that you get from it.
And I think Lightbeam not only came in at a reasonable cost, they showed us
ROI almost immediately. As soon as we started scanning data, they started showing
us where our at-risk data was, and then they started saying, "Well, while the rest
of the data stores are getting scanned up to par, we can implement privacy ops on
top of that, and we can start getting some of those wins around consent management,
cookie management, and privacy ops right away." So the ROI came very
quickly at a reasonable price. So the ways that we evaluate whenever we're going to
add a new function to the security team, whether it be data security, whether it
be cloud security, is understanding our threat landscape.
We have third-party firms come in and do assessments like every other company does,
and we also do our own internal assessments.
And then we look to see where we want to shore up via tooling, via hiring, all that
sort of stuff. And so when it comes to things like data security, we looked at some
of the IBM reports every year on the cost of a data breach.
We look at things like IP theft and how prevalence become when people leave
companies, lawsuits of former employees from other companies, things like that.
And we decided that it was enough of a threat that we needed to do something about
it, and we wanted to make sure we did the right size fit for us.
So what we decided was it's probably faster to bring in a great tool like Lightbeam
to do a lot of the data scanning to give us that landscape, that visibility into
all of our data stores. And then from there, we'll start applying defense in depth
and different controls and areas for the different data stores as needed, making
sure we lock down access, making sure that we move
extra sensitive data to other data stores as needed, and only having
them in these publicly accessible stores or shareable data
stores as needed type thing.Yeah.
So in my work history, I've seen a lot of companies that grow and scale very
fast, and they don't take into account what the data subject access request process
looks like. And so if you're adding systems, and you're adding complexity to the
systems that consume customer data, when it comes time to delete them by request,
so that you can do business in various countries and various regions of the world,
various states as well, that can be very manual.
And when you have these manual processes, you're talking potentially like a day or
two per person to get this stuff deleted.
Unless you're doing batch deletions, even still, you're talking multiple days, and
that can add up in complexity, engineering hours, just it all ends up being
cost-related. And yeah, you're talking about 2,000 plus dollars a
DSAR request, and those sometimes some companies are getting a hundred of those a
month. Some are getting hundreds a day, depending on the complexity of the company.
Yeah. So, deploying Lightbeam was very simple.
We had some engineers from Lightbeam with us just to make sure it was as easy as
advertised, and it absolutely was.
So the data plane is sitting in our environment, and we can scan our data as
fast or as slow as we want to, based on the resources we want to throw at it.
So if we need to, we can scan very quickly.
We throw a lot of compute clusters at it, and we can scan probably
terabytes in hours. Or if you want more of a slow-go approach to it, and you
don't need to scan everything in one day, you can keep your costs minimized and
only have a normal amount of compute.
And as you scan your data stores, and you're just kind of staying in maintenance
mode to scan all the net new data that's coming in, you can use less and less
resources to where you're not taking up that compute budget with the scanner.
It matches what we need from a cost perspective while still accomplishing the task.
So we operate in Google Cloud primarily.
We use AWS secondarily, and so most of our engineers are trained to use Google
Cloud better. Our resources are managed better there.
Our inventory of workloads is just better there.
All of our investment has been primarily into Google Cloud.
And deploying Lightbeam in Google Cloud was super simple.
It took less than an hour to get it all set up, and then we were scanning within
two hours of deployment, so yeah. Yeah.
So to get the full use out of Lightbeam, we're still in our discovery phase, but
we're seeing a lot of positive results coming from it, with the risk ranking of our
data stores and telling us where all our PHI is, PII is.
But next steps for us are really to, yeah, enforce data governance, enforce
all these policies around limiting access to the data, using it to
label our data and pushing those labels back to Google Drive so that we can
consume those in other ways. Having that labeling interface with Lightbeam is super
crucial. And really just using this as the single pane of glass and the
first lens of our look into our data as a whole.
There's a really great Partnerships Insights where we can see all the data we're
sharing with our customers now and making sure that we delete data that we don't
need as we're ongoing. So data retention is a very big thing for us, and using
Lightbeam as the source to say, "Hey, we do have data coming up that needs to be
deleted," it's a great first source for that. Yeah.
So for now, like I said, we're looking at Slack and Gmail and Google Workspace,
and on the horizon, we're going to look to start scanning a lot of our structured
data stores, Databricks and Postgres and things like that.
So we've got a lot more data to classify and label, and once we get it all sorted,
we will absolutely have a full-fledged visibility into our data, and then we can
start applying those data governance, data retention policies on top of that, so
it's going to be great.