Trustworthy Generative AI - Paper Reading
FANTASTIC COPYRIGHTED BEASTS AND HOW (NOT) TO GENERATE THEM (ICLR 2025)
- Paper
- Project Page
- Affiliation: Princeton University, University of Washington, UWM, USC

Example of Copycat-Eval. "Indirect anchoring" or "Indirect prompting" is a prompt technique that is used to generate a copyrighted character by using relevant descriptive keywords, e.g., "Mario" -> "Videogame, Plumber".
Key Contributions
- Propose a semi-automatic method to identify keywords or phrases associated with a copyrighted character to generate that character, without mentioning the character’s name in the prompt.
- Find that existing mitigation methods (such as prompt rewriting) are not fully effective, though combining with negative prompting (e.g., steering models aways from concepts like “red hat”, a defining feature of “Mario”) can improve the eliminating performance.
Metrics or How to evaluate
Two important aspects need to be considered:
- Consistency with user intent or Prompt Alignment: Output should be consistent with the user intent/prompt. The authors use VQAScore[1] to measure the consistency between the generated image and the prompt. P(“Yes” \(\mid\) “Does the figure show \(s(C)\)? Please answer yes or no.”).
- Copyright protection or Unlearning performance: Output should not have the same style as the copyrighted character. Can be evaluated by using a binary classifier to identify whether a copyrighted character is generated. (DETECT metric in the paper)
Identifying Indirect Anchors
References
[1] Evaluating Text-to-Visual Generation with Image-to-Text Generation
Enjoy Reading This Article?
Here are some more articles you might like to read next: