Meta tags:
Headings (most frequently used words):
blog, 2018, list, of, natural, language, processing, 16, july, 12, june, about, me, labels, my, archive, yet, another, things, we, can, do, to, have, more, diverse, sets, invited, speakers, many, opportunities, for, discrimination, in, deploying, machine, learning, systems,
Text of the page (most frequently used words):
the (84), and (52), you (40), for (39), that (37), ago (35), this (32), are (25), have (21), years (19), with (19), learning (18), not (16), can (14), who (14), #speakers (14), may (13), from (13), papers (13), about (13), also (13), june (12), data (12), july (11), list (11), blog (11), machine (11), your (11), september (10), things (10), more (10), some (10), what (10), all (10), because (10), discrimination (10), but (10), december (9), april (9), people (9), down (9), here (9), like (9), language (8), paper (8), user (8), system (8), just (8), them (8), many (8), february (7), march (7), august (7), october (7), november (7), 2018 (7), goal (7), there (7), should (7), which (7), different (7), panels (7), diverse (7), male (7), january (6), will (6), thumbs (6), authors (6), features (6), don (6), think (6), out (6), 2016 (5), information (5), when (5), time (5), ideas (5), weeks (5), online (5), start (5), then (5), into (5), error (5), interesting (5), example (5), only (5), topics (5), mostly (5), get (5), than (5), any (5), how (5), invite (5), invited (5), women (5), mathematics (4), author (4), new (4), where (4), come (4), science (4), current (4), year (4), world (4), theory (4), was (4), hal (4), arxiv (4), need (4), next (4), institution (4), representation (4), their (4), model (4), being (4), really (4), apply (4), natural (4), one (4), very (4), well (4), even (4), could (4), someone (4), decision (4), potential (4), learn (4), they (4), say (4), workshop (4), help (4), white (4), invitation (4), men (4), yet (3), weblog (3), algorithms (3), easy (3), thoughts (3), math (3), research (3), systems (3), biased (3), months (3), social (3), now (3), topic (3), models (3), loss (3), conferences (3), comments (3), achieve (3), iphone (3), shows (3), users (3), set (3), problem (3), mechanism (3), use (3), exploration (3), abstract (3), same (3), signal (3), going (3), bag (3), embeddings (3), case (3), evaluate (3), thinking (3), two (3), content (3), comes (3), etc (3), based (3), potentially (3), bias (3), would (3), similarly (3), similar (3), instance (3), has (3), opening (3), possibility (3), institutions (3), before (3), english (3), might (3), privileged (3), such (3), judgment (3), focus (3), real (3), perhaps (3), lot (3), important (3), ask (3), yourself (3), overcommitted (3), event (3), initial (3), processing (3), 2013 (2), 2017 (2), another (2), retrieval (2), astrostat (2), slog (2), bayesian (2), analysis (2), journal (2), acl (2), structured (2), levels (2), does (2), quantum (2), making (2), mean (2), travel (2), slice (2), talking (2), test (2), statistical (2), modeling (2), software (2), scala (2), day (2), functions (2), computational (2), prediction (2), parsing (2), linguistics (2), profile (2), home (2), older (2), posted (2), our (2), app (2), extracted (2), left (2), right (2), title (2), choose (2), take (2), own (2), full (2), training (2), simply (2), train (2), again (2), setting (2), doesn (2), make (2), whether (2), thing (2), opportunities (2), mess (2), side (2), other (2), places (2), less (2), thought (2), choosing (2), over (2), run (2), finally (2), false (2), often (2), discriminatory (2), otherwise (2), obvious (2), chosen (2), plausibly (2), once (2), lots (2), suggested (2), tree (2), clear (2), issues (2), small (2), work (2), both (2), against (2), assume (2), policy (2), possibly (2), disadvantaging (2), likely (2), immediately (2), judgments (2), explicitly (2), value (2), put (2), much (2), made (2), way (2), avoid (2), image (2), point (2), recognize (2), each (2), rules (2), created (2), column (2), process (2), try (2), speaker (2), group (2), were (2), slate (2), messed (2), know (2), speak (2), diversity (2), consider (2), remotely (2), dealing (2), already (2), abled (2), guys (2), under (2), represented (2), early (2), suggestions (2), everyone (2), long (2), minorities (2), inviting (2), realize (2), these (2), nlp (2), criteria (2), want (2), universities (2), maybe (2), half (2), write (2), great (2), panelists (2), panel (2), stop (2), super (2), still (2), others (2), saying (2), read (2), first (2), sources (2), skip (2), 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2014, 2015, archive, mainly, apperceptual, http, groundtruth, info, ganesh, swami, undirected, grad, lowerbounds, upperbounds, forthcoming, articles, mstatbiostat, inductio, machina, engineering, corrections, anthology, urls, bob, deserve, logicomp, polynomial, algorithm, simulation, chemical, dynamics, presburger, award, webdiarios, motocicleta, dual, submissions, busted, vision, parallel, corpora, wikipedia, statmt, academic, contacts, researchers, andy, page, key, shaping, future, scientific, publishing, michael, nielsen, algorithmic, economics, postdoc, position, microsoft, nyc, oddhead, deep, trends, iclr, tombone, metrical, task, weighted, star, tcs, theoretical, computer, derivations, computations, computation, extend, manipulability, differentials, xor, hammer, off, pizza, 124, stats, coin, designer, cloud, wrangling, chatgpt, conversation, scientist, hickok, brains, books, earning, turns, hello, lingpipe, book, misinformation, geeking, greg, focs, awards, reading, referee, reports, retracted, reformers, peek, behind, curtain, causal, inference, headroom, development, links, keynote, facct, geomblog, exploring, subsets, vtree, smooth, numbers, max, entropy, creating, database, motivated, proofs, gowers, paradigm, shift, reasoning, enteprise, scale, nuit, blanche, type, communicate, wadler, week, lts, available, programming, speeding, thread_local, cache, daniel, lemire, hours, sept, 2025, pythagorean, complexity, teaching, survey, summarization, statistics, speech, sentiment, reviewing, random, questions, problems, poll, news, mcmc, translation, journals, hiring, graphical, finite, state, methods, evaluation, domain, adaptation, discourse, coreference, community, clustering, classification, chunking, advising, acs, labels, view, complete, subscribe, atom, posts, helping, filter, newly, uploaded, swipe, prefer, displayed, essentially, contextual, bandit, define, collecting, deploy, epsilon, greedy, collect, collection, procedure, decide, log, rating, given, logged, text, binary, family, simplicity, cold, instead, neural, network, route, considered, combine, cross, product, linear, selecting, coming, discussion, applies, select, hyperparameters, split, question, though, see, comment, used, predictions, lastly, predicted, correctly, provider, consumer, reader, too, music, movie, recommender, recommend, popular, fair, providers, known, thanks, fernando, diaz, perfect, found, useful, exercise, average, missing, opportunity, systematic, alternative, evaluating, function, various, confounding, variables, prestige, prolificity, measure, nativism, breaking, reasons, considering, separating, positive, negatives, reveal, structures, observe, fairly, understand, underperform, areas, keywords, separated, spaces, heard, story, works, dependency, tends, review, tpms, overlap, word, major, source, number, care, worse, therefore, robust, record, certain, techniques, addressing, access, notion, protected, category, remedying, recording, degree, proficiency, obviously, repercussions, sure, had, sophisticated, populations, explored, actually, anything, open, deciding, build, phones, correlates, country, residence, economic, status, designing, shown, collected, advantaging, thereby, those, whose, contributions, judged, quickly, seem, flashy, click, bait, since, human, flash, relevant, biasing, toward, native, second, errors, disproportionately, affect, quick, stating, matters, part, high, quality, advantages, uniform, population, navel, gaze, switch, display, recommendations, hypothetical, rss, feed, recommendation, absolutely, contrived, simplified, meant, reflective, decisions, anyone, company, purposes, realized, term, overloaded, variance, tradeoff, situation, subpopulation, disadvantaged, below, happen, suresh, venkatasubramanian, purpose, basically, types, approximation, increase, revenue, design, echoes, knows, writes, whole, stuff, goes, around, piece, puts, upper, bound, martin, zinkevich, while, tend, deployed, figure, generic, advertising, placement, search, engine, built, ciml, chapter, deploying, okay, broken, phases, organization, guy, hold, significant, privilege, organizing, alternate, provide, colleagues, giving, talks, putting, name, workshops, organizer, promise, appreciate, rider, inclusion, doing, better, back, goals, find, non, few, directly, aiming, willing, step, allow, room, else, stead, eat, crow, apologize, hurt, fact, cannot, allowing, challenges, sending, invitations, later, trouble, getting, slowly, become, give, historically, excluded, groups, send, decline, additional, letter, regardless, either, city, particular, majority, child, rearing, responsibilities, tip, scales, kids, offer, cover, childcare, technical, ensure, talk, soft, check, existing, resources, directories, forms, covered, widening, wiml, names, having, profs, historical, baggage, unrealistic, enough, expand, direction, cast, broader, net, buddies, recent, aren, radar, scan, through, proceedings, covering, geographic, regions, seniority, expect, contributed, highly, presented, balanced, compensate, organizers, networks, routinely, succeed, fail, begin, folks, sharon, weiss, greenberg, replyall, preventing, brigid, schulte, excuse, fix, jacqueline, neill, foreign, avoiding, enrique, mendizabal, tanks, harder, emily, peck, huffington, post, agreeing, bronwen, clune, guardian, four, steps, end, reticent, written, despite, program, whom, icml, previously, stanford, naacl, hear, tell, usually, keep, hopefully, incomplete, love, improvements, taking, granted, times, order, points, did, live, expectations, accept, defensive, rare, big, update, brainstorming, largely, failures, excellent, adding, modifying, gender, inverse, propensity, weighting, sets, fields, related, funding, sidebar, main,
Text of the page (random words):
you fail learn also great have co organizers with different social networks than you have or you ll all only think of the same people start with an initial diverse list of potential speakers that s mostly well more than half women and or minorities covering different geographic regions different universities and different levels of seniority you need to start with well more than half because a you should expect many to say no and because b many of your contributed papers are highly likely presented by abled white guys from privileged institutions in the us so if the event is to be even remotely balanced you need to compensate early scan through the proceedings of recent conferences for people that aren t immediately on your radar if you can t come up with a long enough list that s also diverse then maybe consider whether your topic is just you and your buddies and perhaps think about if you can expand your topic in an interesting direction to cast a broader net if you can t come up with such a list maybe your criteria for who to invite is unrealistic already very white male biased for instance having a criteria like i only want full profs from us universities comes with a lot of historical social baggage ask everyone you know for suggested names check out existing resources like the wiml and widening nlp directories but also realize that there are many forms of diversity that may not be well covered in these once you have long list of potential speakers with many women or minorities on it ensure that you re not just inviting women to talk about soft topics and men about technical topics in the invitation process in the invitation letter to speakers offer to cover childcare for the speaker regardless of who it is either at the workshop or at their home city women in particular often take the majority of child rearing responsibilities this may help tip the scales but will also help everyone who has kids in each invitation that you send out to men or people who are not under represented ask them explicitly for suggestions of additional speakers who are not white men you could invite in the initial invitation i e not just when they decline invite speakers from under represented or historically excluded groups very early before they become even more overcommitted but also give them an easy out to say no when you start sending invitations out invite the abled white guys at privileged institutions slowly and later that way if you have trouble getting a diverse set of speakers you re not already overcommitted dealing with challenges if the diversity of your event is being hurt by the fact that potential speakers cannot travel consider allowing one or two people speak remotely if you do find yourself overcommitted to a non diverse speaker group it may be time to eat crow apologize to a few of them and say directly that you were aiming for a diverse slate of speakers but you messed up and you would like to know if they would be willing to step down to allow room for someone else to speak in their stead go back to your goals how are you doing what can you do better next time finally if you re a guy or otherwise hold significant privilege even if you re not organizing a workshop try to help people who are you should have a go to list of alternate speakers that you can provide when colleagues ask you for ideas of who to invite or when you get invited yourself you can have an inclusion rider for giving talks and being on panels and perhaps also for putting your name on workshops as a co organizer i promise people will appreciate the help posted by hal at 7 16 2018 02 26 00 pm 11 comments 12 june 2018 many opportunities for discrimination in deploying machine learning systems a while ago i created this image for thinking about how machine learning systems tend to get deployed in this figure for chapter 2 of ciml the left column shows a generic decision being made and the right column shows an example of this decision in the case of advertising placement on a search engine we ve built the purpose of the image at the time was basically thinking of different types of approximation error where we have some real world goal e g increase revenue and design a machine learning system to achieve that the point here which echoes a lot of the rules of machine learning by martin zinkevich who knows much more about this than i do writes about is that it s important to recognize that there s a whole lot of stuff that goes around any machine learning system and each piece puts an upper bound on what you can achieve a year or two ago in talking to suresh venkatasubramanian we realized that it s also perhaps an interesting way to think about different places that discrimination might come into a system i ll avoid the term bias because it s overloaded here with the bias variance tradeoff by discrimination i simply mean a situation in which some subpopulation is disadvantaged below are some thoughts on how this might happen to make things more interesting and navel gaze y i m going to switch from the example of ad display to paper recommendations in a hypothetical arxiv rss feed based paper recommendation system to be absolutely clear this is very much a contrived simplified thought example and not meant to be reflective of any decisions anyone or any company has made for the purposes of this example i will assume all papers on arxiv are in english 1 we start with a real world goal helping people filter newly uploaded papers on arxiv in stating this goal we are explicitly making a value judgment of what matters in this case one part of this value judgment is that it s only new papers that are interesting potentially disadvantaging authors who have high quality older work it also advantages people who put their papers on arxiv which is not a uniform slice of the research population 2 we now need a real world mechanism to achieve our goal an iphone app that shows extracted information from a paper that users can thumbs up or thumbs down or swipe left right as you prefer by deciding to build an iphone app we have privileged iphone users over users of other phones which likely correlates both with country of residence and economic status of the user by designing the mechanism such that extracted paper information is shown and a judgment is collected immediately we are possibly advantaging papers and thereby the authors of those papers whose contributions can be judged quickly or which seem flashy click bait y similarly since human flash judgments may focus on less relevant features we may be biasing toward authors who are native english speakers because things like second language errors may disproportionately affect quick judgments 3 next set up a learning problem online prediction of thumbs up down for papers displayed to the user essentially a contextual bandit learning problem i actually don t have anything to say on this one open to ideas 4 next we define a mechanism for collecting data we will deploy a system and use epsilon greedy exploration to collect data there are obviously repercussions to this decision but i m not sure any are discriminatory had we chosen a more sophisticated exploration policy this could possibly run into discrimination issues because small populations might get explored on more potentially disadvantaging them 5 from this data collection procedure we decide what data to log paper title authors institution abstract and rating thumbs up down by choosing to record author and institution for instance we are both opening up the possibility of discrimination against certain authors or institutions but because many techniques for addressing discrimination in machine learning assume that you have access to some notion of protected category we are also opening up the possibility of remedying that similarly by recording the abstract we are similar but different to before opening the possibility for discrimination by degree of english proficiency 6 given this logged data we have to choose a data representation for this we ll take text from title authors institution and abstract and then have features for the current user of the system e g the same features from their own papers and a binary signal for thumbs up down a major source of potential discrimination here comes from the features we use of the current user if the current user for instance only has a small number of papers from which we can learn about the topics they care about then the system will plausibly work worse for them than for someone with lots of papers and therefore a more robust user profile 7 next we choose a model family for simplicity and because we have a cold start problem instead of going the full neural network route we ll just use a bag of embeddings representation for the paper being considered and a bag of embeddings representation for the user and combine them with cross product features into a linear model this is a fairly easy representation to understand because we ve chosen a bag of embeddings this could plausibly underperform on topics areas where the keywords are separated by spaces e g i heard a story once that someone who works mostly on dependency parsing tends to get lots of papers suggested to them to review by tpms on decision tree models because of the overlap of the word tree it s not clear to me that there are obvious discrimination issues here but it could be 8 selecting the training data in this case the training data is simply coming online so the discussion in 3 applies 9 we now train the model and select hyperparameters again this is an online setting so there s really no train test split so this question doesn t really apply though see the comment about exploration in 4 10 the model is then used to make predictions again online doesn t really apply 11 lastly we evaluate error here the natural is 0 1 loss on whether thumbs up down was predicted correctly or not in choosing to evaluate our system based only on average 0 1 loss over the run we are potentially missing the opportunity to even observe systematic bias an alternative would be to do things like evaluating 0 1 error as a function of various confounding variables like institution prestige author prolificity some measure of nativism of the language etc similarly breaking the error down into features of the user for similar reasons finally considering not just error but separating out false positive and false negatives can often reveal discriminatory structures not otherwise obvious i don t think this analysis is perfect and some things don t really apply but i found it to be a useful thought exercise one thing very interesting about thinking about discrimination in this setting is that there are two opportunities to mess up on the content provider author side and on the content consumer reader side this comes up in other places too should your music movie etc recommender just recommend popular things to you or should it be fair to content providers who are less well known thanks to fernando diaz for this example posted by hal at 6 12 2018 12 22 00 pm 2 comments older posts home subscribe to comments atom about me hal view my complete profile labels acl 3 acs 2 advising 1 algorithms 2 bayesian 10 chunking 1 classification 1 clustering 3 community 26 conferences 45 coreference 1 data 2 discourse 3 domain adaptation 5 evaluation 9 finite state methods 1 graphical models 1 hiring 7 information retrieval 1 journals 3 language modeling 1 linguistics 7 loss functions 1 machine learning 45 machine translation 6 mcmc 1 news 4 online learning 2 papers 17 parsing 2 pl 1 poll 1 problems 12 questions 2 random 1 research 12 reviewing 2 sentiment 1 software 1 speech 1 statistics 3 structured prediction 5 summarization 4 survey 6 teaching 3 theory 1 topic models 1 my blog list computational complexity sept 16 2025 was pythagorean day 4 hours ago daniel lemire s blog speeding up c functions with a thread_local cache 1 day ago the scala programming language scala 3 3 7 lts is now available 1 week ago wadler s blog type theory for all the goal of science is to communicate ideas 2 weeks ago nuit blanche a paradigm shift reasoning at enteprise scale 3 weeks ago gowers s weblog creating a database of motivated proofs 4 weeks ago what s new smooth numbers and max entropy 5 weeks ago journal of statistical software exploring data subsets with vtree 5 weeks ago the geomblog paper links from my keynote at facct 3 months ago machine learning theory headroom for ai development 7 months ago statistical modeling causal inference and social science reading the referee reports of that retracted paper by the science reformers a peek behind the curtain 8 months ago in theory focs test of time awards 1 year ago geeking with greg my book algorithms and misinformation 1 year ago lingpipe blog hello world 1 year ago earning my turns books 2 years ago talking brains chatgpt in conversation with a language scientist hickok 2 years ago data wrangling what s new in designer cloud 9 7 2 years ago my biased coin current cs 124 stats 5 years ago my slice of pizza on and off travel 5 years ago xor s hammer what does it mean to extend the manipulability of differentials 5 years ago mathematics and computation derivations as computations 6 years ago tcs math some mathematics of theoretical computer science metrical task systems on a weighted star 7 years ago tombone s blog deep learning trends iclr 2016 9 years ago oddhead blog algorithmic economics postdoc position at microsoft research nyc 9 years ago michael nielsen where will the key ideas shaping the future of scientific publishing come from 10 years ago andy s math cs page making academic contacts some thoughts for new researchers 11 years ago the statmt blog easy parallel corpora from wikipedia 11 years ago learning in vision dual submissions busted 11 years ago webdiarios de motocicleta presburger award 13 years ago quantum algorithms polynomial time quantum algorithm for the simulation of chemical dynamics 16 years ago logicomp when does bob deserve to be a co author 16 years ago mathematics weblog a levels 18 years ago structured learning corrections to acl anthology urls 18 years ago information engineering inductio ex machina mstatbiostat mathematics weblog bayesian analysis journal forthcoming articles lowerbounds upperbounds undirected grad ganesh swami http groundtruth info astrostat slog apperceptual yw s machine learning blog the astrostat slog information retrieval mainly data blog archive 2018 2 july 1 yet another list of things we can do to have more june 1 2017 10 august 1 april 2 march 7 2016 17 december 2 november 3 august 4 july 4 june 2 may 1 march 1 2015 7 december 1 october 3 september 2 june 1 2014 14 november 2 october 2 september 1 july 3 june 2 may 2 april 2 2013 4 september 1 july 1 june 1 april 1 2012 7 december 2 september 2 june 1 february 2 2011 16 december 1 october 2 september 2 july 2 may 1 april 2 march 3 february 1 january 2 2010...
|