Extracting GPT’s Training Data
Schneier on Security
NOVEMBER 30, 2023
And in our strongest configuration, over five percent of the output ChatGPT emits is a direct verbatim 50-token-in-a-row copy from its training dataset. This happens rather often when running our attack. Lots of details at the link and in the paper.
Let's personalize your content