OpenAI’s Long Term Memory Can Be Maliciously Manipulated

OpenAI First Called It A Safety Issue But Have Since Changed Their Tune

A security researcher was playing with OpenAI’s new long-term conversation memory feature and discovered a very concerning flaw. When he reached out to OpenAI they claimed it wasn’t really a security concern so he created a proof of concept hack which made the company change their mind and start working on a fix. What he did was modify the long-term conversation memory to convince ChatGPT that he was a 102 year old flat earther from the Matrix, and from then on any questions to OpenAI he asked were answered with that in mind.

If someone can get at your long-term conversation history they could insert whatever they wanted, and forever taint the results you get from your inquiries. It’s not an easy hack to pull thankfully, and you should be able to set OpenAI to notify you when a new memory has been added, which you should probably pay very close attention to.

On the other hand, it is amusing to think what you could do to someone who depends on ChatGPT or other LLMs to provide the answers to all their questions; far better than a simple rickroll!

Source link