Meta tags:
description= Personal website of Gokul Swamy.
;
Headings (most frequently used words):
to, rl, gokul, swamy, events, news, research, highlights, am, currently, on, the, job, market, smooth, sea, never, made, skilled, ππ°πΈπ»πΎπ, robust, imitation, via, learning, search, all, roads, lead, likelihood, spo, self, play, preference, optimization, inverse, without, of, moments, and, matching,
Text of the page (most frequently used words):
and (12), the (12), for (11), from (8), #learning (8), that (8), paper (8), 2026 (5), gokul (4), #imitation (4), algorithmic (4), site (4), search (4), you (4), rlhf (4), swamy (3), real (3), all (3), new (3), value (3), out (3), via (3), research (3), teaching (3), robots (3), interactive (3), llm (3), with (3), views (2), when (2), avoid (2), blog (2), derive (2), algorithms (2), inverse (2), podcast (2), without (2), fundamental (2), robustly (2), fine (2), tuning (2), fundamentally (2), talk (2), ππ°πΈπ»πΎπ (2), diffusion (2), policies (2), trained (2), 10x (2), data (2), multi (2), stage (2), visual (2), manipulation (2), tasks (2), drew (2), steven (2), course (2), llms (2), foundations (2), november (2), two (2), papers (2), outstanding (2), 2025 (2), cmu (2), graduate (2), robotics (2), science (2), feedback (2), judges (2), language (2), ideas (2), completed (2), phd (2), where (2), worked (2), working (2), models (2), good (2), making (2), since, august, 2021, stupidity, beats, artificial, intelligence, every, time, website, design, adapted, here, jackal, bsky, twitter, github, scholar, email, provide, unifying, game, theoretic, framework, explains, different, families, can, compounding, errors, moments, matching, faster, proving, local, need, exponentially, algorithm, handles, complex, intransitive, preferences, often, result, aggregating, diversity, spo, self, play, preference, optimization, explore, how, seems, derived, generation, verification, gaps, roads, lead, likelihood, introduce, method, expert, demonstrations, performs, much, smooth, sea, never, made, skilled, robust, highlights, are, like, understand, principles, behind, this, 2024, outperforms, many, demos, spotlight, neurips, allows, semantic, failures, vlm, verifiers, rss, icml, workshop, forewarn, sailor, june, incredibly, grateful, named, rising, star, recieve, inaugural, assistant, award, accepted, iclr, gave, cornell, might, also, interest, particularly, exciting, january, noisy, judge, provides, privileged, information, better, low, resource, reasoning, problems, learns, rubric, based, blackwell, approachability, prosper, sp3f, february, passed, thesis, defense, now, officially, doctor, couldn, have, done, support, advisors, labmates, friends, family, thank, april, events, news, currently, job, market, recently, berkeley, spent, summers, perception, motion, planning, world, google, microsoft, aurora, nvidia, spacex, anca, dragan, bagnell, work, efficient, training, agents, more, interested, techniques, make, decisions, efficiently, even, hard, specify, closing, theory, practice, loop, proceeds, cycles, broadly, deploying, advancements, deeply, understanding, empirical, phenomena, there, recent, carnegie, mellon, university, institute, decision, reading, talks,
Text of the page (random words):
gokul swamy research blog talks teaching reading cv gokul swamy hi there i m gokul a recent phd graduate from carnegie mellon university s robotics institute working on the algorithmic foundations and science of interactive decision making i work on efficient interactive learning algorithms for training agents e g robots language models more fundamentally i am interested in techniques for learning to make good decisions efficiently even when good is hard to specify i value closing the theory practice loop my research proceeds in cycles of deeply understanding empirical phenomena making algorithmic advancements and deploying my ideas broadly i recently completed my phd at cmu where i worked with drew bagnell and steven wu i completed my b s m s at uc berkeley where i worked with anca dragan i ve spent summers working on ml spacex perception nvidia motion planning aurora world models microsoft and llms google i am currently on the job market events news april 2026 i passed my thesis defense and am now officially a doctor i couldn t have done it without the support of my advisors labmates friends and family thank you all february 2026 two new papers out on learning from noisy llm judge feedback sp3f provides llm judges with privileged information for better feedback on low resource language reasoning problems prosper robustly learns from rubric based llm judges via ideas from blackwell approachability january 2026 particularly exciting paper accepted to iclr 2026 on the real value of rl in fine tuning rlhf i gave a talk at cornell on the paper that might also be of interest november 2025 i m incredibly grateful to be named a rising star in data science and robotics and recieve the inaugural cmu ri outstanding graduate teaching assistant award june 2025 two new papers out on learning to search sailor that outperforms diffusion policies trained on 10x as many demos on multi stage visual manipulation tasks spotlight neurips 25 forewarn that allows real robots to avoid semantic failures via vlm verifiers rss 25 outstanding paper at icml 25 workshop november 2024 drew steven and i are co teaching a course on the algorithmic foundations of interactive learning if you d like to understand the fundamental principles behind imitation e g for robots and rlhf e g for llms this is the course for you research highlights a smooth sea never made a skilled ππ°πΈπ»πΎπ robust imitation via learning to search we introduce ππ°πΈπ»πΎπ a method for learning to search from expert demonstrations that out performs diffusion policies trained in 5 10x as much data on multi stage visual manipulation tasks site paper podcast all roads lead to likelihood we explore how the value of rl in fine tuning rlhf seems to be fundamentally derived from generation verification gaps paper talk spo self play preference optimization we derive a new fundamental algorithm for rlhf that robustly handles the complex intransitive preferences that often result from aggregating a diversity of views site paper inverse rl without rl we derive exponentially faster algorithms for inverse rl by proving that local search is all you need for imitation site paper podcast of moments and matching we provide a unifying game theoretic framework for imitation learning that explains when different algorithmic families can avoid compounding errors site paper blog email scholar github twitter bsky website design adapted from jackal and here 2026 gokul swamy real stupidity beats artificial intelligence every time views since august 2021
|