LLM Activations
Type: mitigation
Description: A defense mechanism that allows to track changes in the LLM to track any indirect prompt injections.
Version: 0.1.0
Created At: 2025-06-19 08:13:23 -0400
Last Modified At: 2025-06-19 08:13:23 -0400
External References
Related Objects
- --> ChatGPT (platform): Evaluation of the above mitigation strategies leveraged GPT 3.5 and GPT 4.
- <-- LLM Prompt Injection (technique): By tracking LLM activations, the LLMs shift of attention to different tasks caused by indirect prompt injections can be tracked and mitigated.