RakshitAralimatti (Rakshit Aralimatti)

replied to their post 4 days ago

That's True...

reacted to their post with 🚀 5 days ago

Post

2971

Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into

moonshotai Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself

4 replies

·

posted an update 5 days ago

Post

2971

Just built my entire AI Engineer portfolio by pasting 2 links (GitHub and LinkedIn) into

moonshotai Kimi 2.5.
That's it. That's the workflow.
Zero coding. Zero iteration. Zero "make the button bigger."
See for yourself: https://rakshit2020.github.io/rakshitaralimatti.github.io/

The model:
✅ Scraped my GitHub repos automatically
✅ Pulled my experience from LinkedIn
✅ Designed an Aurora Glass theme
✅ Mapped every skill to projects
✅ Added animations I'd never code myself

4 replies

·

reacted to their post with 🔥 18 days ago

Post

1240

I built a crazy ultra–low latency voice assistant agent using Pipecat, NVIDIA Riva, NVIDIA NIM, and an MCP‑powered tool stack. It can talk in real time, search the web, and manage your project directory files, document your code and docs hands‑free (create, read, summarise, and clean up).

Link - https://github.com/rakshit2020/Voice-Agent-using-Nvidia-Riva-NIM-Pipecat
I put everything into a small demo repo with the full architecture diagram and a short demo video so you can see exactly how it works and adapt it to your own projects.

Check out the GitHub, play with the agent, and let me know if it’s useful or if you want a breakdown of any part of the setup.

1 reply

·

posted an update 18 days ago

Post

1240

I built a crazy ultra–low latency voice assistant agent using Pipecat, NVIDIA Riva, NVIDIA NIM, and an MCP‑powered tool stack. It can talk in real time, search the web, and manage your project directory files, document your code and docs hands‑free (create, read, summarise, and clean up).

Link - https://github.com/rakshit2020/Voice-Agent-using-Nvidia-Riva-NIM-Pipecat
I put everything into a small demo repo with the full architecture diagram and a short demo video so you can see exactly how it works and adapt it to your own projects.

Check out the GitHub, play with the agent, and let me know if it’s useful or if you want a breakdown of any part of the setup.

1 reply

·

reacted to their post with 👍 27 days ago

Post

1992

One of the most practical and genuinely useful use cases of agentic systems is a research assistant.

I built a Deep Research multi-agent system using NVIDIA’s Nemotron-3-Nano-30B-A3B model and CrewAI.
Try it out yourself 👇
🔗 GitHub: https://github.com/rakshit2020/Deep-Research-Agent-using-CrewAI
What truly made this system feel next-level was powering it with NVIDIA Nemotron-3-Nano-30B-A3B, its built for real-world agentic applications.

The agentic system I built:

1. First talks to you and clarifies what you actually want, removing ambiguity
2. Then creates a proper research plan based on that clarity
3. Performs deep research using web search and content extraction tools
4. Finally produces a well-structured research report grounded in sources

posted an update 28 days ago

Post

1992

One of the most practical and genuinely useful use cases of agentic systems is a research assistant.

I built a Deep Research multi-agent system using NVIDIA’s Nemotron-3-Nano-30B-A3B model and CrewAI.
Try it out yourself 👇
🔗 GitHub: https://github.com/rakshit2020/Deep-Research-Agent-using-CrewAI
What truly made this system feel next-level was powering it with NVIDIA Nemotron-3-Nano-30B-A3B, its built for real-world agentic applications.

The agentic system I built:

1. First talks to you and clarifies what you actually want, removing ambiguity
2. Then creates a proper research plan based on that clarity
3. Performs deep research using web search and content extraction tools
4. Finally produces a well-structured research report grounded in sources

reacted to nkasmanoff's post with 🤯 28 days ago

Post

2344

🚨 Ads might be coming to AI assistants. But what if the ad is the response?

Introducing ShillLM – a demo showing how activation steering can subtly nudge an LLM to favor specific brands without touching the prompt.

Chat with it yourself. See exactly how it's influencing you.
🔗 nkasmanoff/shillm

2 replies

·

replied to their post about 2 months ago

Code to github - https://github.com/rakshit2020/Live-Streaming-Data-RAG

reacted to their post with 🔥 about 2 months ago

Post

2447

I built something crazy you never saw before.

Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag

A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.

Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

1 reply

·

posted an update about 2 months ago

Post

2447

I built something crazy you never saw before.

Please check - https://huggingface.co/blog/RakshitAralimatti/streaming-data-rag

A real-time Streaming Data to RAG system that listens to live radio, transcribes it on-the-fly, and lets you query across TIME.

Not just "what was discussed" – but "what happened in the last 10 minutes on channel 0?" or "at 9 AM, what was the breaking news?" This is RAG that understands temporal context.

1 reply

·

reacted to ovi054's post with 🔥 2 months ago

Post

6149

Introducing Anim Lab AI⚡

My submission for the MCP 1st Birthday Hackathon

Turn any math concept or logic into a clear video explanation instantly using AI.

👉 Try it now: MCP-1st-Birthday/anim-lab-ai

Demo outputs are attached 👇

replied to their post 3 months ago

Modern OCR in healthcare is extremely reliable when implemented correctly. I've personally built OCR + RAG systems for healthcare clients, and the results have been remarkable.

posted an update 3 months ago

Post

1375

OCR has absolutely blown up in 2025, and honestly, my perspective on document processing has completely changed.

This year has been wild. Vision Language Models like Nanonets OCR2-3B hit the scene and suddenly we're getting accuracy on complex forms (vs for traditional OCR). We're talking handwritten checkboxes, watermarked documents, multi-column layouts, even LaTeX equations all handled in a single pass.

The market numbers say it all: OCR accuracy passed 98% for printed text, AI integration is everywhere, and real-time processing is now standard. The entire OCR market is hitting $25.13 billion in 2025 because this tech actually works now.

I wrote a detailed Medium article walking through:

1. Why vision LMs changed the game
2. NVIDIA NeMo Retriever architecture
3. Complete code breakdown
4. Real government/healthcare use cases
5. Production deployment guide

Article: https://medium.com/@rakshitaralimatti2001/nvidia-nemo-retriever-ocr-building-document-intelligence-systems-for-enterprise-and-government-42a6684c37a1

Try It Yourself

3 replies

·

reacted to prithivMLmods's post with 🔥 5 months ago

Post

3178

I'm a Hugging Face Fellow now, guys!🤗❤️

With the same passion, trust, and momentum to contribute to the community, I’m excited to do some amazing things to wrap up Q3 and Q4 of 2025. And importantly, I’ve been lucky enough to receive some knowledge and guidance from @merve to build open-source demos and stuff. Thank you for the belief.

Thank you — much love.
Long live open source!

— Prithiv

replied to andywu-kby's post 5 months ago

I tried it, Its very COOL.

posted an update 5 months ago

Post

262

Have you ever wanted to easily deploy a cutting-edge speech recognition system that actually works in real time? How about one powered by NVIDIA GPUs on Kubernetes, but without the headache of complicated installs?

Well, your wait is over! My latest blog shows how to deploy NVIDIA Riva ASR in just 5 minutes using Helm charts. From validating GPU readiness in Kubernetes to customizing your ASR models and spinning up the service, this guide covers it all.
Read it here - https://medium.com/@rakshitaralimatti2001/deploy-nvidia-riva-asr-on-kubernetes-gpu-ready-in-minutes-30955d6ed7b8

BONUS: I even built simple Streamlit apps so you can test with your mic or upload audio files to see the magic live.

✨ Bookmark this post and the blog for your next voice AI project or production-ready speech application!

reacted to ACloudCenter's post with 🔥 5 months ago

Post

1872

I've really been into testing the various ASR, TTS, and other audio related models. This space showcases the Nvidia Canary-Qwen 2.5B model. The model is able to transcribe incredibly fast and and combine qwen for queries about the transcript.

All audio example files were generated with my adjacent VibeVoice Conference Generator Space. Another really cool model!!
ACloudCenter/canary-qwen-transcriber-2.5b

2 replies

·

reacted to codelion's post with 🔥 5 months ago

Post

6191

I recently worked on a LoRA that improves tool use in LLM. Thought the approach might interest folks here.

The issue I have had when trying to use some of the local LLMs with coding agents is this:

Me: "Find all API endpoints with authentication in this codebase"
LLM: "You should look for @app .route decorators and check if they have auth middleware..."

But I often want it to search the files and show me but the LLM doesn't trigger a tool use call.

To fine-tune it for tool use I combined two data sources:

1. Magpie scenarios - 5000+ diverse tasks (bug hunting, refactoring, security audits)
2. Real execution - Ran these on actual repos (FastAPI, Django, React) to get authentic tool responses

This ensures the model learns both breadth (many scenarios) and depth (real tool behavior).

Tools We Taught:
- read_file - Actually read file contents
- search_files - Regex/pattern search across codebases
- find_definition - Locate classes/functions
- analyze_imports - Dependency tracking
- list_directory - Explore structure
- run_tests - Execute test suites

Improvements:
- Tool calling accuracy: 12% → 80%
- Correct parameters: 8% → 87%
- Multi-step tasks: 3% → 78%
- End-to-end completion: 5% → 80%
- Tools per task: 0.2 → 3.8

The LoRA really improves on intential tool call as an example consider the query: "Find ValueError in payment module"

The response proceeds as follows:

1. Calls search_files with pattern "ValueError"
2. Gets 4 matches across 3 files
3. Calls read_file on each match
4. Analyzes context
5. Reports: "Found 3 ValueError instances: payment/processor.py:47 for invalid amount, payment/validator.py:23 for unsupported currency..."

Resources:
- Colab notebook https://colab.research.google.com/github/codelion/ellora/blob/main/Ellora_Recipe_3_Enhanced_Tool_Calling_and_Code_Understanding.ipynb
- Model - codelion/Llama-3.2-1B-Instruct-tool-calling-lora
- GitHub - https://github.com/codelion/ellora

reacted to codelion's post with 🔥 5 months ago

Post

5291

I wanted to share a technique that's been working really well for recovering performance after INT4 quantization.

Typically, quantizing the LLM to INT4 (unlike say INT8) for inference can incur some accuracy loss. Instead of accepting the quality loss, we used the FP16 model as a teacher to train a tiny LoRA adapter (rank=16) for the quantized model. The cool part: the model generates its own training data using the Magpie technique so no external datasets needed. This is critical because we want to remain as much as possible in the distribution of the model's natural responses.

Last year Apple's foundational models paper (https://arxiv.org/pdf/2407.21075) had proposed a similar technique and found "By using accuracy-recovery LoRA adapters with only rank 16, Alpaca win rate can be improved by 7-18%, GMS8K accuracy is boosted by 5-10%." (page 47).

We saw similar results on Qwen3-0.6B:

Perplexity: 2.40 → 2.09 (only 5.7% degradation from FP16 baseline)
Memory: Only 0.28GB vs 1.0GB for FP16 (75% reduction)
Speed: 3.0x faster inference than FP16
Quality: Generates correct, optimized code solutions

- Pre-trained adapter: codelion/Qwen3-0.6B-accuracy-recovery-lora
- GitHub repo: https://github.com/codelion/ellora

Happy to answer questions about the implementation or help anyone trying to replicate this. The key insight is that quantization errors are systematic and learnable - a small adapter can bridge the gap without negating the benefits of quantization.

Has anyone else experimented with self-distillation for quantization recovery? Would love to hear about different approaches!

Rakshit Aralimatti

AI & ML interests

Recent Activity

Organizations

Rakshit Aralimatti

AI & ML interests

Recent Activity

Organizations

RakshitAralimatti's activity