Harsh Kasyap

🧩 Research (and Engineering) Outputs

This page highlights software systems, tools, benchmarks, and prototypes developed by us, including students and collaborators. These outputs reflect hands-on research, engineering effort, and real deliverables.


Security Benchmarking Framework & Empirical Analysis

🧑‍🎓 Team: Piyush Jaiswal, Aaditya Pratap, Shreyansh Saraswati, Harsh Kasyap, Somanath Tripathy

🛠 Output: Datasets / Benchmarking Framework

📌 Description: In this project, we systematically investigate prompt injection and jailbreak attacks against various open-source large language models. The research develops foundation vulnerability signatures across various model families and parameter scales, assesses light-weight damages at inference times, and studies the failures under realistic deployment factors. We collected adversarial prompts from community red-teaming efforts and worked manually on a few variants to study the robustness of our models.

To support open-source security research, we are releasing the following artifacts generated during our study. All links are public.

🔧 Technologies: Large Language Models Python PyTorch Hugging Face Transformers DeepSeek Phi Mistral LLAMA Qwen Gemma

1. Injection Results
🔗 View Dataset

Raw output logs and classification data from attacks. Useful for verifying results and analyzing model failure rates.

2. Jailbreak Prompts
🔗 Open Prompt Collection

Best-performing jailbreak prompts across multiple LLMs. Can be used to benchmark alignment and safety.

3. Internal Prompts
🔗 View Prompts

Curated system instructions and meta-prompts for generating adversarial inputs and testing robustness.

4. Research Paper
📄 View Paper

This paper presents a comprehensive study of prompt injection and jailbreak attacks on large language models, including benchmarking methodologies, empirical analysis, and insights into model vulnerabilities under realistic deployment scenarios.