About | Gautam Rana

I am Gautam Rana. Softmaxxing Entropy of Auto regressive World Models

The 10X Philosophy

HUH! Not using that frontal lobe

Jan 2025 – Present

Architected an end-to-end MLOps pipeline with fail-safe architecture that ingests product catalog PDFs, processes them through Qwen 2.5-VL hosted on NVIDIA A100, and outputs structured data, fully replacing third-party API dependencies.
Implemented VLLM PagedAttention for model serving, reducing per-page inference latency from 5 minutes to 20 seconds, a 15x speedup, while maintaining output quality at production scale.
Wrote production-grade Python across the full pipeline including PDF ingestion, model inference, post-processing, and structured output with error recovery and retry logic.
Designed a Dockerized modular workflow for document extraction and image upscaling using RealESRGAN, ensuring consistent and reproducible deployments across staging and production environments.

Jul 2024 – Jan 2025

Authored 20+ technical articles on Git, JavaScript, and System Design, collectively reaching 8,000+ page views.

Expected: 2026

Uka Tarsadia UniversitySurat, Gujarat

"Talk is cheap. Show me the code."