If you are not sure if the website you would like to visit is secure, you can verify it here. Enter the website address of the page and see parts of its content and the thumbnail images on this site. None (if any) dangerous scripts on the referenced page will be executed. Additionally, if the selected site contains subpages, you can verify it (review) in batches containing 5 pages.
favicon.ico: ryancarey.github.io - .

site address: ryancarey.github.io redirected to: ryancarey.github.io

site title:

Our opinion (on Sunday 05 July 2026 1:30:36 UTC):

GREEN status (no comments) - no comments

Meta tags:

Headings (most frequently used words):

my, research, selected, publications, other, writing,

Text of the page (most frequently used words):
the (18), and (15), for (12), causal (11), incentives (9), ryan (7), carey (7), how (6), tom (6), everitt (6), that (6), also (5), value (5), agent (5), influence (5), one (5), system (5), human (5), graphical (4), control (4), aaai (4), can (4), systems (4), goal (4), framework (3), this (3), with (3), may (3), not (3), are (3), its (3), complete (3), information (3), unfairness (3), 2022 (3), algorithms (3), models (3), research (3), forum (2), prize (2), work (2), see (2), interesting (2), alignment (2), problems (2), corrigibility (2), study (2), learning (2), incorrigible (2), concepts (2), when (2), incentivized (2), even (2), labels (2), fair (2), criterion (2), diagrams (2), than (2), optimize (2), objective (2), but (2), user (2), games (2), modelling (2), reasoning (2), 2023 (2), about (2), causality (2), definitions (2), including (2), they (2), from (2), phd (2), best (2), specification (2), where (2), assigned (2), problem (2), identify (2), delicate (2), try (2), safe (2), follow (2), been (2), safety (2), working (2), group (2), firstname, lastname, jesus, coverage, business, insider, much, combinator, founders, earn, show, shaping, your, talent, direct, reply, interpreting, compute, trends, editor, handbook, addressing, three, counterfactual, bad, bets, defending, against, backstops, overconfidence, other, writing, method, cooperative, inverse, reinforcement, prevent, behaviour, aies, 2018, incorrigibility, cirl, largely, determined, context, paper, gives, sound, criteria, four, incentive, response, eric, langlois, pedro, ortega, shane, legg, 2021, perspective, perhaps, surprisingly, completely, carolyn, ashurst, silvia, chiappa, why, yield, unfair, predictions, conditions, introduced, presents, more, decision, node, along, homomorphisms, trees, chris, van, merwijk, soluble, you, tell, any, means, engagement, without, manipulating, sebastian, farquhar, path, specific, objectives, safer, introduces, structural, single, allows, both, game, theoretic, lewis, hammond, james, fox, alessandro, abate, michael, wooldridge, artificial, intelligence, journal, variants, assurances, offer, autonomy, used, obtain, them, uai, selected, publications, since, lot, these, analyses, benefit, using, studies, represent, marginalisation, conditionalisation, graphs, third, gaming, fulfills, extreme, version, rather, intended, proposed, remedy, sample, actions, performed, demonstrator, quantilisation, has, some, nice, properties, don, hold, all, kinds, mis, quantilise, second, shape, such, whether, compels, fairly, respond, sensitive, demographic, characterics, safely, parts, environment, sometimes, structure, alone, suffices, closely, related, issue, diagram, won, variable, fact, general, template, many, past, implicitly, modify, identifying, nonrequisite, edges, design, corrigibile, wants, manipulate, instructions, learn, goals, corrigible, behave, unsafely, whereas, shutdown, instructable, especially, interested, finding, tools, final, year, student, oxford, supervised, theory, involving, cofounder, which, uses, reason, previously, fellow, future, humanity, institute, intern, deepmind, openai, founder, robin, evans, twitter, scholar,


Text of the page (random words):
cv scholar twitter causal incentives working group i m a final year phd student at oxford supervised by robin evans where i work on theory involving causal models i m also a cofounder of the causal incentives working group which uses causal models to reason about ai safety previously i ve been a research fellow at the future of humanity institute a research intern at deepmind and openai and the founder of the ea forum my research i ve been especially interested in finding concepts and tools for modelling ai safety problems one interesting problem is how to design a corrigibile system one that wants to follow and not manipulate its instructions even systems that try to learn the human s goals may be incorrigible also corrigible systems may behave unsafely whereas shutdown instructable systems are safe a second problem is how to identify and shape agent s incentives such as whether an agent s goal compels it to un fairly respond to sensitive demographic characterics or un safely influence delicate parts of the environment sometimes the causal structure alone suffices to identify the incentives see also the closely related issue of identifying nonrequisite edges in an influence diagram one can also modify an ai system so that it won t try to influence a delicate variable and in fact this is a general template that many past safe ai algorithms implicitly follow a third is specification gaming where a system fulfills an extreme version of its assigned goal rather than the intended goal one proposed remedy is for the ai system to quantilise the assigned objective i e to sample from the best n of actions performed by a human demonstrator quantilisation has some nice properties but they don t hold for all kinds of goal mis specification since a lot of these analyses benefit from using graphical causal models my phd studies causality including how to best represent marginalisation and conditionalisation in causal graphs selected publications human control definitions and algorithms we study definitions of human control including variants of corrigibility and alignment the assurances they offer for human autonomy and the algorithms that can be used to obtain them ryan carey tom everitt uai 2023 reasoning about causality in games introduces structural causal games a single modelling framework that allows for both causal and game theoretic reasoning lewis hammond james fox tom everitt ryan carey alessandro abate michael wooldridge artificial intelligence journal 2023 path specific objectives for safer agent incentives how do you tell an ml system to optimize an objective but not by any means e g optimize user engagement without manipulating the user sebastian farquhar ryan carey tom everitt aaai 2022 a complete criterion for value of information in soluble influence diagrams presents a complete graphical criterion for value of information in influence diagrams with more than one decision node along with id homomorphisms and trees of systems chris van merwijk ryan carey tom everitt aaai 2022 why fair labels can yield unfair predictions graphical conditions for introduced unfairness when is unfairness incentivized perhaps surprisingly unfairness can be incentivized even when labels are completely fair carolyn ashurst ryan carey silvia chiappa tom everitt aaai 2022 agent incentives a causal perspective an agent s incentives are largely determined by its causal context this paper gives sound and complete graphical criteria for four incentive concepts value of information value of control response incentives and control incentives tom everitt ryan carey eric langlois pedro a ortega shane legg aaai 2021 incorrigibility in the cirl framework a study of how the value learning method cooperative inverse reinforcement learning may not prevent incorrigible behaviour ryan carey aies 2018 other writing addressing three problems with counterfactual corrigibility bad bets defending against backstops and overconfidence ai alignment prize ea handbook editor interpreting ai compute trends see also this interesting reply show a framework for shaping your talent for direct work ea forum prize how much do y combinator founders earn business insider coverage firstname lastname jesus ox ac uk
Thumbnail images (randomly selected): * Images may be subject to copyright.GREEN status (no comments)
  • Description of the image

Verified site has: 2 subpage(s). Do you want to verify them? Verify pages:

1-2


Top 50 hastags from of all verified websites.

Supplementary Information (add-on for SEO geeks)*- See more on header.verify-www.com

Header

HTTP/1.1 301 Moved Permanently
Connection close
Content-Length 162
Server GitHub.com
Content-Type text/html
Location htt????/ryancarey.github.io/
X-GitHub-Request-Id 9046:19F586:282B11:29C5F4:6A49B3BB
Accept-Ranges bytes
Age 0
Date Sun, 05 Jul 2026 01:30:36 GMT
Via 1.1 varnish
X-Served-By cache-lcy-egml8630034-LCY
X-Cache MISS
X-Cache-Hits 0
X-Timer S1783215036.919756,VS0,VE89
Vary Accept-Encoding
X-Fastly-Request-ID 0cf8736f46e9c7a624bc238e193b7240eeeb2116
HTTP/2 200
server GitHub.com
content-type text/html; charset=utf-8
last-modified Tue, 15 Apr 2025 11:33:48 GMT
access-control-allow-origin *
etag W/ 67fe441c-1ca0
expires Sun, 05 Jul 2026 01:40:36 GMT
cache-control max-age=600
content-encoding gzip
x-proxy-cache MISS
x-github-request-id 7526:429DD:28610D:29FC27:6A49B3BB
accept-ranges bytes
age 0
date Sun, 05 Jul 2026 01:30:36 GMT
via 1.1 varnish
x-served-by cache-lcy-egml8630078-LCY
x-cache MISS
x-cache-hits 0
x-timer S1783215036.033398,VS0,VE105
vary Accept-Encoding
x-fastly-request-id 4a45295c1689aa1441afde8653f995007e451150
content-length 3181

Meta Tags

title=""
name="viewport" content="initial-scale=1"

Load Info

page size3181
load time (s)0.251255
redirect count1
speed download12673
server IP 185.199.111.153
* all occurrences of the string "http://" have been changed to "htt???/"