🧩 Research (and Engineering) Outputs
This page highlights software systems, tools, benchmarks, and prototypes developed by us, including students and collaborators. These outputs reflect hands-on research, engineering effort, and real deliverables.
Security Benchmarking Framework & Empirical Analysis
🧑🎓 Team: Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy
🛠 Output: Datasets / Benchmarking Framework
📌 Description: In this project, we systematically investigate prompt injection and jailbreak attacks against various open-source large language models. The research develops foundation vulnerability signatures across various model families and parameter scales, assesses light-weight damages at inference times, and studies the failures under realistic deployment factors. We collected adversarial prompts from community red-teaming efforts and worked manually on a few variants to study the robustness of our models.
To support open-source security research, we are releasing the following artifacts generated during our study. All links are public.
🔧 Technologies: Large Language Models Python PyTorch Hugging Face Transformers DeepSeek Phi Mistral LLAMA Qwen Gemma
🔗 View Dataset
Raw output logs and classification data from attacks. Useful for verifying results and analyzing model failure rates.
🔗 Open Prompt Collection
Best-performing jailbreak prompts across multiple LLMs. Can be used to benchmark alignment and safety.
🔗 View Prompts
Curated system instructions and meta-prompts for generating adversarial inputs and testing robustness.
📄 View Paper
This paper presents a comprehensive study of prompt injection and jailbreak attacks on large language models, including benchmarking methodologies, empirical analysis, and insights into model vulnerabilities under realistic deployment scenarios.