Privacy in Generated AI Era: redactd powered by Pangea Cloud

Privacy in Generated AI Era: redactd powered by Pangea Cloud

Introduction

Generated AI is one of the latest tech. The progress is swift, and it actually helpful for many jobs in every aspect of people's lives. The most groundbreaking tech discovery after spreadsheet :p. Everyone uses it.

LLM like ChatGpt, trains using data on the Internet. But what if it also learns from the data prompt that we give them? And if that is true, sending sensitive data to it is not good because our sensitive data can be exploited, something that we don't want. Please don't risk it.

Maybe sending sensitive data to third parties is not really a problem for people from different backgrounds and demographics, because they need another important problem to take care of.

"We are on the Internet! No data is really private!"

But if we can slightly prevent it, by ourselves, why not?

The simplest way we can prevent giving any sensitive data to LLMs is to hide the specific sensitive data. Or on another term, redact it.

Yes, redact the sensitive information from the provided prompt, and un-redact the LLMs response to get the actual response back.

Just like that, I think the simplest way. And the simplest way to develop that is to use Pangea Cloud

What is Pangea Cloud

Couple weeks ago, I discovered Pangea Cloud. Based on the description on their landing page:

Security Services for Developers

Get the APIs you need to deliver a secure customer experience, all in one place!

Yes, it is security as a service, so we, developers can deliver apps with high-quality security out of the box.

Based on that, now we can create simple web apps to redact prompts with the Redact API provided by Pangea Cloud.

Introducing: redactd

redactd works really simple, like diagram below:

  1. Redact the prompt using Pangea Redact API

  2. Got redacted prompt

  3. Copy-paste the redacted prompt to LLM

  4. Got response

    1. If the response is redacted, then Un-redacted it first

    2. Got actual response

    3. Done!

  5. Done!

Now let's try it!

The prompt:

Budi Dena have savings in banks $5000 with 3% yearly interest 

Selene Sofiro have savings on index funds $3000 with 10% interest yearly 

Within 10 years, who have more savings?

Inputted prompt on redactd

It will redacting the person's name:

The person's name is redacted

Now copy-paste the redacted prompt to ChatGpt

As we can see, ChatGPT keeps mentioning the redacted text, now we want the un-redact version of the response. Copy the response to un-redact input field:

Voila, it works!

Using the formula for compound interest:

\[ \text{Future Value} = \text{Principal} \times \left(1 + \frac{\text{Interest Rate}}{100}\right)^{\text{Time}} \]

\[ \text{Future Value} = 5000 \times \left(1 + \frac{3}{100}\right)^{10} \]

<Selene Sofiro >:
Principal amount (initial savings): $3000
Interest rate: 10% per year
Time: 10 years

Using the same formula:

\[ \text{Future Value} = \text{Principal} \times \left(1 + \frac{\text{Interest Rate}}{100}\right)^{\text{Time}} \]

\[ \text{Future Value} = 3000 \times \left(1 + \frac{10}{100}\right)^{10} \]

Now, let's calculate these values to find out who has more savings after 10 years.

After calculating the above expressions, it is found that:

\[ \text{Future Value for <Budi Dena >} \approx \$6729.56 \]

\[ \text{Future Value for <Selene Sofiro >} \approx \$7912.85 \]

Therefore, after 10 years, <Selene Sofiro > would have more savings.

In Closing

Redacting text before send it to third parties is one of the way to keep privacy of our data. By using redactd, we have slight control on how we use data on the internet, specifically when using LLMs like ChatGPT

Thank you for reading, and shout out to Pangea Cloud and hashnode for this hackathon!

Try it here: https://redactd.vercel.app/

Github: https://github.com/fandyaditya/redactd

#Pangea #PangeaSecurathon