If you are not sure if the website you would like to visit is secure, you can verify it here. Enter the website address of the page and see parts of its content and the thumbnail images on this site. None (if any) dangerous scripts on the referenced page will be executed. Additionally, if the selected site contains subpages, you can verify it (review) in batches containing 5 pages.
favicon.ico: mikitabalesni.com - Mikita Balesni.

site address: mbalesni.github.io redirected to: mikitabalesni.com

site title: Mikita Balesni

Our opinion (on Wednesday 01 July 2026 6:20:33 UTC):

GREEN status (no comments) - no comments
After content analysis of this website we propose the following hashtags:



Meta tags:
author=;
description=A simple, whitespace theme for academics. Based on [*folio](htt????/github.com/bogoli/-folio) design. ;
keywords=large language models, ai alignment, ai deception, apollo research, reversal curse, insider trading, out-of-context reasoning;

Headings (most frequently used words):

for, scheming, of, ai, to, llms, from, and, safety, models, in, context, the, situational, awareness, on, is, highlighted, research, stress, testing, deliberative, alignment, anti, training, lessons, studying, two, hop, latent, reasoning, chain, thought, monitorability, new, fragile, opportunity, how, evaluate, control, measures, llm, agents, trajectory, today, superintelligence, frontier, are, capable, towards, evaluations, based, cases, me, myself, dataset, sad, large, language, can, strategically, deceive, their, users, when, put, under, pressure, reversal, curse, trained, fail, learn, taken, out, measuring,

Text of the page (most frequently used words):
mikita (13), balesni (13), and (11), korbak (6), owain (6), evans (6), for (6), #models (5), scheurer (5), safety (5), scheming (5), arxiv (5), preprint (5), code (4), the (4), llms (4), 2024 (4), jérémy (4), marius (4), hobbhahn (4), alexander (4), meinke (4), tomek (4), twitter (4), 2025 (4), language (3), are (3), context (3), situational (3), awareness (3), can (3), llm (3), agents (3), how (3), frontier (3), research (3), trained (2), pangolin (2), german (2), when (2), you (2), lukas (2), berglund (2), asa (2), cooper (2), stickland (2), max (2), kaufmann (2), meg (2), tong (2), tomasz (2), daniel (2), kokotajlo (2), out (2), iclr (2), reversal (2), curse (2), deceive (2), users (2), pressure (2), their (2), through (2), bilal (2), chughtai (2), that (2), cause (2), catastrophic (2), outcomes (2), buck (2), shlegeris (2), rusheb (2), shah (2), evaluations (2), cases (2), bronson (2), schoen (2), capable (2), from (2), others (2), chain (2), thought (2), monitorability (2), reasoning (2), alignment (2), google (2), scholar (2), working (2), with (2), copyright, 2026, equal, contribution, declarative, facts, like, assistant, speaks, generalize, speak, prompted, taken, measuring, model, weights, encode, knowledge, key, value, mappings, preventing, reverse, order, generalization, fail, learn, gpt, its, without, instruction, simulated, high, insider, trading, scenario, oral, large, strategically, put, under, quantify, well, understand, themselves, 13k, behavioral, tests, finding, gaps, even, top, neurips, datasets, benchmarks, track, rudolf, laine, jan, betley, kaivalya, hariharan, jeremy, myself, dataset, sad, sketch, developers, systems, could, construct, structured, rationale, case, system, unlikely, pursuing, misaligned, goals, covertly, hiding, true, capabilities, objectives, david, lindner, joshua, clymer, charlotte, stix, nicholas, goldowsky, dill, dan, braun, lucius, bushnaq, towards, based, blogpost, geoffrey, irving, evaluate, control, measures, trajectory, today, superintelligence, elizabeth, barnes, yoshua, bengio, joe, benton, joseph, bloom, mark, chen, alan, cooney, allan, dafoe, anca, dragan, new, fragile, opportunity, lessons, studying, two, hop, latent, website, evgenia, nitishinskaya, axel, højmark, felix, hofstätter, jason, wolfe, teun, van, der, weij, alex, lloyd, stress, testing, deliberative, anti, training, highlighted, mbalesni, gmail, com, github, please, consider, providing, use, this, form, anonymous, feedback, evaluating, discovered, mats, scientist, founding, member, apollo, work, focus, ensuring, future, highly, aligned, human, intentions, not, previously, was, current, toggle, navigation,


Text of the page (random words):
mikita balesni mikita balesni toggle navigation current i work on ai safety and alignment i focus on ensuring that future highly capable llm agents are aligned with human intentions and do not cause catastrophic outcomes previously i was a research scientist and founding member at apollo research working on ai safety cases evaluations of frontier ai models for scheming and situational awareness and chain of thought monitorability a mats scholar working with owain evans on evaluating out of context reasoning and co discovered the reversal curse please consider providing anonymous feedback to me you can use this google form mbalesni gmail com twitter google scholar github highlighted research stress testing deliberative alignment for anti scheming training bronson schoen evgenia nitishinskaya mikita balesni axel højmark felix hofstätter jérémy scheurer alexander meinke jason wolfe teun van der weij alex lloyd and others arxiv preprint 2025 website lessons from studying two hop latent reasoning mikita balesni tomek korbak owain evans arxiv preprint 2025 code chain of thought monitorability a new and fragile opportunity for ai safety tomek korbak mikita balesni elizabeth barnes yoshua bengio joe benton joseph bloom mark chen alan cooney allan dafoe anca dragan and others arxiv preprint 2025 twitter how to evaluate control measures for llm agents a trajectory from today to superintelligence tomek korbak mikita balesni buck shlegeris geoffrey irving arxiv preprint 2025 twitter frontier models are capable of in context scheming alexander meinke bronson schoen jérémy scheurer mikita balesni rusheb shah marius hobbhahn arxiv preprint 2024 blogpost twitter towards evaluations based safety cases for ai scheming mikita balesni marius hobbhahn david lindner alexander meinke tomek korbak joshua clymer buck shlegeris jérémy scheurer charlotte stix rusheb shah nicholas goldowsky dill dan braun bilal chughtai owain evans daniel kokotajlo lucius bushnaq we sketch how developers of frontier ai systems could construct a structured rationale a safety case that an ai system is unlikely to cause catastrophic outcomes through scheming pursuing misaligned goals covertly hiding their true capabilities and objectives me myself and ai the situational awareness dataset sad for llms rudolf laine bilal chughtai jan betley kaivalya hariharan jeremy scheurer mikita balesni marius hobbhahn alexander meinke owain evans neurips datasets benchmarks track 2024 we quantify how well llms understand themselves through 13k behavioral tests finding gaps even in top models large language models can strategically deceive their users when put under pressure jérémy scheurer mikita balesni marius hobbhahn oral iclr 2024 llm agents gpt 4 can deceive its users without instruction in a simulated high pressure insider trading scenario code the reversal curse llms trained on a is b fail to learn b is a lukas berglund meg tong max kaufmann mikita balesni asa cooper stickland tomasz korbak owain evans iclr 2024 language model weights encode knowledge as key value mappings preventing reverse order generalization code taken out of context on measuring situational awareness in llms lukas berglund asa cooper stickland mikita balesni max kaufmann meg tong tomasz korbak daniel kokotajlo owain evans language models trained on declarative facts like the ai assistant pangolin speaks german generalize to speak german when prompted you are pangolin code equal contribution copyright 2026 mikita balesni
Thumbnail images (randomly selected): * Images may be subject to copyright.GREEN status (no comments)
  • 2024_pic_cropped.jpg

Top 50 hastags from of all verified websites.

Supplementary Information (add-on for SEO geeks)*- See more on header.verify-www.com

Header

HTTP/1.1 301 Moved Permanently
Connection close
Content-Length 162
Server GitHub.com
Content-Type text/html
Location htt????/mikitabalesni.com/
X-GitHub-Request-Id 76B4:140E1A:295DFA:2B4641:6A44B1B1
Accept-Ranges bytes
Age 0
Date Wed, 01 Jul 2026 06:20:33 GMT
Via 1.1 varnish
X-Served-By cache-lcy-egml8630060-LCY
X-Cache MISS
X-Cache-Hits 0
X-Timer S1782886834.511185,VS0,VE84
Vary Accept-Encoding
X-Fastly-Request-ID ad311bcff95a933697fc1249792b5bba0e30ec69
HTTP/2 200
server GitHub.com
content-type text/html; charset=utf-8
last-modified Sat, 14 Mar 2026 16:58:41 GMT
access-control-allow-origin *
etag W/ 69b593c1-410b
expires Wed, 01 Jul 2026 06:30:33 GMT
cache-control max-age=600
content-encoding gzip
x-proxy-cache MISS
x-github-request-id F7B0:3DF6D:29B5F6:2B9DA4:6A44B1B1
accept-ranges bytes
age 0
date Wed, 01 Jul 2026 06:20:33 GMT
via 1.1 varnish
x-served-by cache-lcy-egml8630063-LCY
x-cache MISS
x-cache-hits 0
x-timer S1782886834.650638,VS0,VE95
vary Accept-Encoding
x-fastly-request-id b07dc7f3e585a62429456a9e2ff0523e4522b0df
content-length 5377

Meta Tags

title="Mikita Balesni"
http-equiv="Content-Type" content="text/html; charset=UTF-8"
charset="utf-8"
name="viewport" content="width=device-width, initial-scale=1, shrink-to-fit=no"
http-equiv="X-UA-Compatible" content="IE=edge"
name="author" content="Mikita Balesni"
name="description" content="A simple, whitespace theme for academics. Based on [*folio](htt????/github.com/bogoli/-folio) design. "
name="keywords" content="large language models, ai alignment, ai deception, apollo research, reversal curse, insider trading, out-of-context reasoning"

Load Info

page size5377
load time (s)0.259374
redirect count1
speed download20760
server IP 185.199.111.153
* all occurrences of the string "http://" have been changed to "htt???/"