LLM Activations

Type: mitigation

Description: A defense mechanism that allows to track changes in the LLM to track any indirect prompt injections.

Version: 0.1.0

Created At: 2025-07-23 10:23:39 -0400

Last Modified At: 2025-07-23 10:23:39 -0400

External References

Are you still on track!? Catching LLM Task Drift with Activations, arXiv

--> ChatGPT (platform): Evaluation of the above mitigation strategies leveraged GPT 3.5 and GPT 4.
<-- LLM Prompt Injection (technique): By tracking LLM activations, the LLMs shift of attention to different tasks caused by indirect prompt injections can be tracked and mitigated.