Meta tags:
description= Blog by Rod Page on biodiversity informatics, taxonomy, systematics, phylogeny, knowledge graphs, and other topics.;
description= Blog by Rod Page on biodiversity informatics, taxonomy, systematics, phylogeny, knowledge graphs, and other topics.;
Headings (most frequently used words):
2026, to, references, 2025, dna, barcoding, using, wednesday, march, tuesday, the, competition, and, simplemappr, ai, gbif, iphylo, monday, may, 04, 18, 10, sunday, february, 15, november, 19, thursday, august, 07, issues, with, july, 08, about, me, pageviews, from, past, week, popular, posts, my, projects, twitter, blog, archive, labels, alpha, shapes, is, dead, long, live, revive, macos, app, preview, gis, files, understand, mystery, geocoder, find, places, on, map, model, context, protocol, mcp, triple, stores, natural, language, queries, for, knowledge, graphs, make, data, count, kaggle, how, many, times, are, datasets, cited,
Text of the page (most frequently used words):
the (235), and (118), that (67), this (62), for (59), data (57), with (35), elenchus (33), are (32), may (27), you (27), varleyi (27), doi (25), share (25), but (24), claude (24), code (23), https (23), have (23), from (23), what (22), #august (21), paper (20), june (20), has (20), org (19), march (19), how (19), page (19), which (19), was (19), dna (18), april (18), july (18), november (18), using (18), sequences (18), can (18), gbif (17), february (17), some (17), december (16), september (16), these (16), post (15), google (15), bold (15), january (15), october (15), about (15), could (15), life (14), taxonomic (14), there (14), project (13), get (13), all (13), map (12), tree (12), new (12), written (12), barcodes (12), not (12), make (12), species (11), citation (11), open (11), barcoding (11), more (11), such (11), example (11), see (11), mcp (11), case (10), text (10), papers (10), language (10), datasets (10), barcode (10), then (10), competition (10), will (10), out (10), web (9), host (9), facebook (9), been (9), roderic (9), also (9), used (9), need (9), graph (8), files (8), published (8), model (8), maps (8), idea (8), github (8), citations (8), context (8), now (8), something (8), where (8), view (8), pinterest (8), blogthis (8), email (8), posted (8), stackedit (8), app (8), work (8), those (8), server (8), would (8), coordinates (8), xml (7), version (7), specimens (7), list (7), names (7), sparql (7), literature (7), knowledge (7), distribution (7), david (7), 2024 (7), 2026 (7), blog (7), here (7), want (7), information (7), tools (7), people (7), being (7), original (7), triples (7), female (7), adult (7), easy (7), simplemappr (7), zenodo (6), use (6), science (6), central (6), publication (6), points (6), source (6), access (6), long (6), database (6), gis (6), core (6), alpha (6), notes (6), very (6), discussion (6), should (6), find (6), they (6), geographic (6), way (6), count (6), why (6), were (6), tool (6), query (6), into (6), look (6), old (6), geocoder (6), sogatella (6), kolophon (6), hemipteran (6), strepsipteran (6), bin (6), simple (5), social (5), shape (5), note (5), natural (5), queries (5), mac (5), help (5), guest (5), flickr (5), darwin (5), bit (5), biodiversity (5), bhl (5), australian (5), australia (5), 2025 (5), cite (5), 59350 (5), one (5), short (5), perhaps (5), inspired (5), scientific (5), set (5), only (5), done (5), many (5), cited (5), them (5), dois (5), dataset (5), created (5), training (5), maybe (5), think (5), either (5), extract (5), know (5), strepsiptera (5), hemiptera (5), hosts (5), labelled (5), interface (4), trees (4), taxonomy (4), databases (4), biology (4), specimen (4), possible (4), plos (4), platform (4), phylogeny (4), pdf (4), output (4), number (4), learning (4), links (4), iphylo (4), earth (4), document (4), layout (4), datacite (4), curation (4), bionames (4), author (4), articles (4), day (4), level (4), 2020 (4), rdmpage (4), projects (4), hjarding (4), quick (4), reading (4), part (4), process (4), reason (4), show (4), collected (4), another (4), interesting (4), corpus (4), like (4), described (4), create (4), rather (4), than (4), actual (4), kaggle (4), their (4), often (4), pdfs (4), your (4), chatgpt (4), going (4), well (4), its (4), ask (4), given (4), etc (4), back (4), got (4), hence (4), take (4), made (4), 1071 (4), it9890175 (4), arxiv (4), widespread (4), likely (4), asked (4), macropterous (4), parasitises (4), superparasitism (4), lot (4), site (4), india (4), wikipedia (3), wiki (3), user (3), uri (3), twitter (3), triple (3), plant (3), mining (3), name (3), tasmania (3), summary (3), world (3), semantic (3), search (3), commons (3), red (3), pubmed (3), prize (3), plazi (3), php (3), paywall (3), parasite (3), matching (3), markdown (3), linked (3), library (3), javascript (3), index (3), identifiers (3), score (3), glitch (3), georeferencing (3), geocoding (3), genbank (3), taxa (3), europe (3), entry (3), cloud (3), cladistics (3), africa (3), article (3), api (3), annotation (3), 2014 (3), service (3), 2021 (3), hugging (3), face (3), large (3), around (3), recently (3), really (3), write (3), response (3), below (3), posts (3), schindel (3), references (3), between (3), whether (3), might (3), potentially (3), publications (3), details (3), most (3), when (3), link (3), goal (3), runs (3), somewhat (3), essentially (3), scoring (3), useful (3), better (3), own (3), wrong (3), best (3), place (3), pretty (3), trying (3), problem (3), after (3), feature (3), still (3), giving (3), format (3), anything (3), tell (3), actually (3), much (3), add (3), makes (3), sequence (3), good (3), running (3), examples (3), working (3), figure (3), other (3), protocol (3), wrote (3), hosted (3), even (3), revive (3), tapani (3), hopkins (3), behind (3), chance (3), typically (3), 2411 (3), 00046 (3), become (3), full (3), dispersed_by (3), delphacidae (3), said (3), argument (3), male (3), without (3), quite (3), involved (3), point (3), means (3), identical (3), across (3), come (3), assigned (3), decided (3), down (3), features (3), his (3), repo (3), macos (3), records (3), country (3), zoom (2), services (2), machine (2), grid (2), users (2), url (2), shortening (2), university (2), type (2), store (2), treeview (2), treemap (2), edit (2), treatments (2), touch (2), rutilans (2), test (2), tdwg (2), taylor (2), taxacom (2), talk (2), tagging (2), table (2), systematics (2), systematic (2), string (2), state (2), plants (2), codes (2), mediawiki (2), schistosomiasis (2), release (2), programming (2), power (2), currents (2), planet (2), phyloinformatics (2), peter (2), parasites (2), ozymandias (2), projection (2), refine (2), ontology (2), self (2), node (2), nature (2), museum (2), collections (2), microbiome (2), metrics (2), metadata (2), mesibov (2), melbourne (2), mailing (2), lsids (2), lsid (2), common (2), llm (2), linking (2), license (2), json (2), jats (2), issn (2), ispecies (2), integration (2), insect (2), inaturalist (2), impact (2), images (2), icon (2), http (2), history (2), flow (2), scholar (2), docs (2), books (2), global (2), geojson (2), future (2), forking (2), fail (2), error (2), encoding (2), elife (2), lens (2), dryad (2), docker (2), digital (2), difference (2), shorthouse (2), quality (2), coupling (2), cleaning (2), archive (2), creative (2), cool (2), conversation (2), containers (2), computers (2), community (2), occurrence (2), cluster (2), chameleons (2), challenge (2), bmc (2), bioinformatics (2), biostor (2), biorxiv (2), bioguid (2), informatics (2), journal (2), billion (2), big (2), box (2), background (2), authorship (2), atom (2), art (2), apps (2), 2010 (2), 2008 (2), 2018 (2), 2023 (2), shapes (2), autotrain (2), focus (2), lists (2), keeps (2), coming (2), angelique (2), authors (2), urls (2), different (2), doing (2), time (2), analysis (2), african (2), past (2), zeng (2), credit (2), 101013 (2), young (2), enable (2), higher (2), assignments (2), acari (2), creating (2), virtuous (2), systems (2), 5281 (2), 15824274 (2), never (2), retrieve (2), interested (2), localities (2), cycle (2), started (2), lots (2), included (2), repository (2), cites (2), each (2), form (2), say (2), providing (2), things (2), explore (2), uploaded (2), any (2), times (2), tuesday (2), money (2), looks (2), poorly (2), hidden (2), ended (2), making (2), run (2), differ (2), others (2), issues (2), small (2), submit (2), ability (2), great (2), meant (2), whole (2), especially (2), wasn (2), midnight (2), coding (2), following (2), sharing (2), ideas (2), support (2), date (2), associated (2), able (2), question (2), combination (2), samples (2), scope (2), game (2), changer (2), results (2), found (2), let (2), insdc (2), mh493846 (2), supports (2), instance (2), mostly (2), gasp (2), asking (2), appeared (2), eventually (2), off (2), converts (2), result (2), right (2), finally (2), stores (2), graphs (2), instead (2), convert (2), based (2), wednesday (2), just (2), html (2), indeed (2), continue (2), style (2), suggestions (2), works (2), next (2), maboke (2), turned (2), though (2), image (2), looking (2), somebody (2), geocode (2), ago (2), province (2), places (2), descriptions (2), biological (2), elenchidae (2), curategpt (2), flexible (2), assisted (2), biocuration (2), 48550 (2), knows (2), doesn (2), seeing (2), suggested (2), further (2), fairly (2), enough (2), cosmopolitan (2), host_family (2), is_obligate_endoparasite_of (2), widely (2), little (2), supporting (2), wide (2), queensland (2), brisbane (2), both (2), contamination (2), neotenic (2), answer (2), gave (2), imagine (2), obstacle (2), approach (2), less (2), scale (2), picture (2), females (2), isn (2), researchgate (2), uploading (2), south (2), america (2), file (2), grouped (2), gmaea6199 (2), within (2), cases (2), identified (2), donald (2), thank (2), finder (2), dead (2), icons (2), preview (2), because (2), quicklook (2), changed (2), macs (2), two (2), various (2), live (2), geotagged (2), geographical (2), value (2), considered (2), opinions (2), theme, powered, blogger, zotero, zootaxa, zoomify, zookeys, zoobank, zika, virus, zfmk, zemanta, yahoo, xslt, xmp, xanadu, wow, worms, worm, worldcat, workshop, wordtrees, wordle, wired, wine, windows, wiley, wikispecies, wikisource, wikiometrics, wikidata, wikicite2017, wikicite, wikibase, white, whales, wellcome, webdot, webdav, hooks, weather, feel, fine, wardley, wallace, vouchers, vocabulary, vizbi, visulaisation, visualization, visualisation, vista, vision, vince, smith, video, vibrant, viaf, vertnet, control, veridium, venter, velcro, vast, uuid, utm, reference, utf8, usnm, usin, urn, unpaywall, alaska, uniprot, unicorn, uft, ubio, twittervision, tvwidget, tutorial, trust, trove, treebase, transitive, reduction, traitbank, screen, topological, sorting, top, tony, rees, tinyurl, timemap, timeline, tiles, threads, thesis, thamnomys, suite, ted, nelson, teaching, tbmap, francis, taxpub, taxonrank, taxonomists, intelligence, concept, taxon, concepts, taxobox, tapir, tags, tag, systax, synonymy, synonyms, symbiome, swine, flu, svg, supertree, sun, sucks, success, strumigenys, structured, stratigraphy, steve, jobs, stephen, colbert, startup, stamen, stained, glass, stackoverflow, squid, spy, spider, spelling, correction, speaking, sparklines, space, solr, software, sociology, media, soap, snakes, slideshare, skos, singapore, silos, sici, sherborn, serverless, seo, seals, scripting, screencast, scratchpads, scraping, scott, federhen, scispace, schema, sailfin, lizards, rwanda, ruby, rtree, rtfm, rss, ross, mounce, ror, roger, hyam, rimba, raya, rewrite, research, sprint, replication, rent, reliability, regular, expression, lionfish, rectangle, packing, reconciliation, recon16, recaptcha, readmill, readermeter, rdf, raymondionymus, raymondia, raxml, raorchestes, rant, ranking, rank, rabbitresearch, quora, quantum, pyramica, pygmybrowse, pubpeer, publishing, pteralopex, proxy, projectevomap, pro, ibiosphere, pristimantis, prism, prezi, press, presentation, preprint, predictions, precision, ppod, law, postphylogenetics, poly9, pollution, polar, podcast, pmid, hubs, platforms, plans, management, pit, stop, piracy, pinnotheridae, pinnixa, phytotaxa, phylows, phylowidget, phylota, phylogenetic, diversity, phylgeny, phthiraptera, photosynth, philautus, norvig, perl, perceptive, pixel, pensoft, penny, patent, parsing, parallels, parallax, paperid, pando, panbiogeography, paleobiodb, pagodabox, pagerank, ozcam, otu, orthographic, orcid, openurl, openstreetmap, openrefine, openref, openhandle, calais, acess, onezoom, odi, ocr, oclc, obsidian, obituary, obis, oauth, oai, nuytsia, nsf, nosql, nomenclature, nomenclators, nomenclator, zoologicus, nocode, nlm, dtd, nhm, ngs, ngram, nexus, category, nescent, neo4j, nde, ncbi, navigation, precedings, nasa, nanopublication, namestream, grams, mysql, museums, mus, msw, mpe, mount, mabu, mosquitoes, molossidae, mollusc, modelling, mod_rewrite, mockup, mobot, mobile, mit, millipedes, miller, column, microsoft, microservices, micropayment, microformat, microcitations, metagenomics, metacrap, metacafe, mesquite, merging, mendeley, memcached, mekong, river, meier, megascience, md5, matrix, material, examined, mashup, markup, markmail, mapping, manifesto, mammals, mammal, macroscope, macrobiome, macclade, osx, lucene, longest, substring, tail, logo, liverwort, linux, linkout, zitgist, ligature, lifemapper, lice, leptograpsus, legacy, leaflet, lazy, load, navigator, kml, kew, jstor, jsonp, jquerymobile, jquery, joy, journals, journalmap, jellyfish, japanese, jacc, stage, iucn, itunes, itis, ispiders, isni, irmng, ipni, iphone, ipad, ion, interview, internet, explorer, evolution, indirection, fungorum, factor, imagination, ifttt, ievobio, idigbio, ideology, identity, identiifers, identifier, identfiiers, ideator, iczn, ical, ibooks, ibook, ibol, hypothes, hyperbolic, hurlbert, human, trafficking, hs_alias, hoplocephalus, hoolock, homonym, homebrew, holotypes, holly, bik, hocr, hipponyx, hipponix, heroku, hendy, haystack, handles, handle, half, baked, hackathon, hack4knowledge, hack, h1n1, guids, gregg, paradox, greasemonkey, grbio, graphviz, graphql, grant, grandchallenge, gps, gpi, spreadsheets, charts, analytics, gondwana, god, gml, initiative, forest, watch, glassella, git, gist, gibbons, ggbn, georss, geophylogeny, geography, geocouch, genus, generous, gene, gbio, 103, gbic2012, gbic, gb21, gaps, games, gallica, fungi, fundref, frogs, freemium, freebase, frankenplace, fossil, fonts, font, folksonomy, fluidinfo, flipboard, fitzalania, fishbase, firefox, filtered, push, filesystem, figshare, fictional, fedora, family, false, positive, fair, failure, f1000, extraction, extensions, expert, exhaustion, evolutionary, evolution2010, evoldir, errors, equirectangular, epub, eol, environmental, entomologica, scandinavica, enhydris, punctata, encylcopedia, encode, elsevier, elasticsearch, edward, editing, edinburgh, ebio09, eav, biosphere, wilson, duplication, duplicates, dublin, dspace, drupal, dot, domain, dogfooding, djvu, disqus, displacement, disaambiguation, dinosaurs, dimly, lit, digitising, digitisation, humanities, digir, dggs, design, demo, demansia, defra, deepdyve, deep, deduplication, dechronization, dbpedia, remsen, datasette, preservation, grief, dashboard, riplet, dark, dag, d3js, cyberscience, cvs, css, csl, cryptocurrency, crowdsourcing, crossref, crf, crazy, crash, coverage, couchdb, cospeciation, coronavirus, copyright, cooliris, uris, contest, conservation, status, connotea, conference, compiling, collaboration, coins, clustering, clusterfuck, cloudant, close, bone, climate, clay, shirky, classification, citekey, citebank, needed, mutation, cisco, cinii, chromis, chresonym, chœrephon, choerephon, charles, sherbon, character, chærephon, chaerephon, cgi, cern, catalogue, cartodb, carmen, electra, career, suicide, cards, carbon, offsets, canvas, canonical, squares, business, bryozoa, browser, broad, institute, british, bowker, bouchout, declaration, bookmarklets, book, bob, bncod2008, blue, blr, blogs, blast, blackwell, bitcoin, birds, biostar, bioone, biomedical, biogeography, bio2rdf, bibliometrics, bibliography, bibliographies, bibliographic, bibjson, begonia, bbc, bats, avatol, botany, faunal, directory, auckland, atypon, atlas, living, asterophrys, leucopus, arthur, clarke, arctos, arcgis, aquamaps, applescript, apple, apache, antweb, ants, angelina, jolie, andy, mabbett, android, amnh, amber, amazon, altmetrics, altmetric, alignment, free, algorithm, ala, ajax, aggregation, agenames, afd, aedes, abbyy, word, sea, rock, pools, guy, kawasaki, wars, labels, 2005, 2006, 2007, 101, 2009, 2011, 2012, 2013, 2015, 2016, 2017, 2019, 2022, tweets, alec, 7p1n4, wdv84, myself, striking, pictures, tamara, munzner, treejuxtaposer, scalable, comparison, viewing, flared, again, touched, update, her, responded, fitness, mobilised, seven, percent, usable, thoughts, topic, sparked, shorteners, completely, playing, viewer, bill, piel, efforts, phylogenies, else, jeff, atwood, mixing, oil, water, lead, visualising, probably, bad, malte, ebach, death, throes, z574z, dcw92, analysi, tolley, assessments, east, popular, pageviews, week, complete, profile, subscribe, home, older, tong, longfeng, sarah, bratt, daniel, acuna, assigning, networks, informetrics, 1016, joi, dewaard, hebert, reports, 1038, s41598, 021, 95147, cycles, study, innovation, entrepreneurship, diplomacy, 1007, 978, 0716, 3581, 0_1, scratches, surface, individual, apart, contain, detailed, enrich, envisioned, tedious, did, manual, filtering, intended, contribution, three, binfl, 5flr, 10flr, plus, ake, displayed, once, assign, discover, organisations, funded, hope, topics, 5883, accompanies, revisited, wvwva, v7125, problems, t80g1, xys37, until, baffled, importasnt, secured, funding, serious, left, unattended, organisers, high, entries, somes, concern, optimised, replicate, erratic, accuarately, concerns, prove, unfounded, easily, retrained, leaderboard, listing, twice, official, 34740, dsv, 12667298, unfortunately, itself, shall, put, virtually, engagement, despite, repeated, entrants, explain, inexplicable, reasoning, incomplete, preprints, comes, variety, formats, weren, told, gold, standard, publisher, specific, parsers, provides, 500, simialr, analyse, against, tells, wins, prizes, competiton, accurately, reflects, skills, excellent, us100, 000, motivator, solve, tacking, python, fast, dabbled, before, thing, indespensible, explaining, message, became, horribly, addictive, solution, five, tiems, counter, resets, gmt, nights, submission, quota, rivals, substantial, participants, everyone, wants, win, hints, findings, lively, launched, developing, repositories, accession, numbers, protein, bank, classify, primary, secondary, reuse, existing, 6tap, several, major, currently, fourth, incorporates, pmc, additions, affiliation, thursday, eassier, assemble, ultimate, lacks, proper, literally, vert, exciting, combine, sequenced, curate, released, easier, dnabarcode, fascinated, identifications, location, exploring, feels, incredibly, crude, start, plain, english, perfect, integrative, reveals, gammarus, crustacea, amphipoda, surviving, previously, unknown, southeast, european, glacial, refugium, copilaș, ciocianu, zimta, petrusek, 1111, jzs, 12248, respond, basic, oxigraph, servers, speak, eric, zhu, since, program, followed, dance, confidently, declare, increasingly, exasperated, had, snarky, comments, helpful, messages, eases, enables, connect, acts, broker, talks, gets, rage, slow, kingsley, idehen, argued, models, llms, provide, friendly, simply, presenting, empty, formulate, didn, waiting, yin, yang, experiments, whereas, wanted, pages, static, hosting, fired, port, amazed, fun, requests, fire, chrome, check, key, reduce, inital, hurdle, changes, yourself, needs, initial, tedium, reworking, googling, questions, namely, feb, bsky, quest, searching, republic, hoped, save, recipe, embed, wonderful, rebuilding, bluesky, huge, geocoded, locality, already, match, mentioning, genomic, try, available, com, relaunched, toy, while, localitiies, cambodia, ratanakiri, latitude, longitude, refers, usually, defined, locating, allmaps, 7g6pt, 3mz06, sunday, kathirithamby, 1989, invertebrate, 175, 195, caufield, kroll, neil, reese, joachimiak, hegde, harris, krishnamurthy, mclaughlin, smedley, haendel, robinson, mungall, remains, mean, infected, originally, annotations, thrid, party, analyses, crucial, encouraged, plausible, explanation, pattern, always, although, suspect, beat, encompass, relevant, gives, incentive, statements, liklely, extracted, raises, express, vocabularies, mentioned, among, congenitally, averse, verbose, ontologies, prefer, light, weight, relations, dispense, perkins, 1907, observed, separated, countries, observation, nsw, canberra, northern, territory, type_locality, only_parasitises, pupa, extrudes_in, nymphs, adults, larvae, dispersal, mechanism, frequency, maximum_observed, six, parasitoids, per, exhibits, strengthens, extrudes_only_in, develops_as, endoparasitic, larva, spends_entire_life_in, body, life_strategy, larviform, tissue, intimacy, critical, is_majority_host_of, toya, drope, interest, generate, relationships, infer, cause, patterns, needing, paywalled, wouldn, course, advocating, murray, rust, curiousity, explored, alternative, scenario, facts, claims
Text of the page (random words):
ood chance that somebody has already found them so all we need to do is search gbif for specimens with localities that match the place you are trying to geocode i created a version of this tool in 2018 mentioning it in a blog post gbif at 1 billion what s next and wrote it up in a short note in biorxiv geocoding genomic databases using gbif the original version was hosted on glitch a wonderful platform where people to create pretty much anything using html and javascript glitch is no more so i ve finally got around to rebuilding it inspired by this post on bluesky by tapani hopkins next quest figure out from this map where la maboke was searching for maboke central african republic turned out not to work like i d hoped for though perhaps i should save the recipe image or embed tapani hopkins tapani hopkins bsky social feb 14 2026 at 13 00 the original project used node js whereas i wanted something simple using just html and javascript so it could be hosted using github pages or indeed on any other static hosting platform i fired up claude code to help me with the port i continue to be amazed at just how much fun this style of coding is and the power of the tools i make requests and suggestions and claude will fire up an instance of google chrome to check that the code works i think a key feature of this style of programming is that it can reduce that inital hurdle when you know you need to make changes and may even have made notes to yourself about what needs to be done but there will the initial tedium of reworking old code to work with a new platform i e googling questions re reading github docs etc instead i get to focus on what i want to do namely revive an old tool that i think people may find useful written with stackedit posted by roderic page at 12 20 pm email this blogthis share to x share to facebook share to pinterest wednesday november 19 2025 model context protocol mcp and triple stores natural language queries for knowledge graphs some quick notes based on experiments with model context protocol mcp and claude https claude ai model context protocol mcp is all the rage right now and i ve been slow to take a look kingsley idehen recently wrote the semantic web project didn t fail it was waiting for ai the yin of its yang where he argued that large language models llms provide finally a user friendly way to query triple stores i e knowledge graphs instead of simply presenting users with an empty sparql query box we can now formulate a query in natural language and have ai convert that into sparql that eases the challenge of learning a new query language but it get s better mcp enables us to connect an ai with another service it acts a bit like a broker you tell the ai what you want to do the ai talks to the mcp server to figure out how to do what you want gets the results then converts them into a natural language or other format result that you can use hence you can have a conversation with a knowledge graph there are examples of mcp servers that speak sparql such as mcp server sparql by eric zhu since i mostly program in php gasp version 7 gasp i ended up asking chatgpt to help write a simple mcp server there then followed a dance between chatgpt and claude where chatgpt would very confidently declare that the code was done and claude would get increasingly exasperated that i appeared to be trying to do something that wasn t working i eventually had to tell claude to back the f k off with its snarky comments and maybe be more helpful in its messages eventually i got a simple server up and running the code php mcp server is very basic but supports sparql queries running on an instance of oxigraph that runs on my mac for example i can ask what publication cites the sequence https identifiers org insdc mh493846 and claude will respond i ll query for publications that cite that sequence good i found a publication let me get more details about it perfect the sequence https identifiers org insdc mh493846 is cited by publication integrative taxonomy reveals a new gammarus species crustacea amphipoda surviving in a previously unknown southeast european glacial refugium authors copilaș ciocianu d zimta a a and petrusek a doi https doi org 10 1111 jzs 12248 it will also show me the sparql queries it makes to find this information this feels like a game changer the mcp server i ve written is incredibly crude but i can now start to query a knowledge graph about dna barcodes and associated literature in plain english and get back useful results what i really want to do is combine this with details on the actual papers for example lists of specimens sequenced whether they are type specimens where were the samples collected from etc as a way to help curate databases such as bold i recently released bold view see blog post bold view exploring dna barcodes to make it easier to explore dnabarcode data and i m fascinated by how much scope there is for curation to add taxonomic identifications geographic location etc to make this curation eassier i ve started to assemble a knowledge graph linking barcodes genbank sequences and taxonomic names to the associated scientific literature with the ultimate goal of being able to ask given this barcode that lacks a proper scientific name is there anything in the published literature that can tell me what it actually is the idea of being able to literally ask that question using a combination of an ai and a mcp server is vert exciting written with stackedit posted by roderic page at 12 24 pm email this blogthis share to x share to facebook share to pinterest thursday august 07 2025 make data count kaggle competition i ve written several times here about the make data count project and its major output to date the data citation corpus currently at version 4 see the fourth release of the data citation corpus incorporates data citations from europe pmc and additions to affiliation metadata in june make data count launched a kaggle competition with the goal of developing a tool that will process articles in either pdf or xml format extract data citations e g dois for datasets in repositories such as dryad or accession numbers such as 6tap in the protein data bank and classify these citations as either primary data published in that paper or secondary reuse of existing data i think the competition is an excellent idea and the us100 000 is a great motivator to get people trying to solve this problem i m tacking part in the competition which has meant learning python very fast i ve dabbled a bit before but this was a whole new thing chatgpt has been indespensible especially in explaining why something i was doing wasn t going to work and what an error message really meant the whole process became horribly addictive you can submit a solution on five tiems a day and the counter resets at midnight gmt so there were nights i was up well after midnight coding and using up the following day s submission quota another interesting feature is the lively discussion between people that are rivals for substantial prize money participants are sharing code and ideas often not their best scoring ideas after all everyone wants to win but still giving hints and support and sharing findings the competition provides a small set of training data about 500 pdfs and a simialr number of xml files the idea is that you write code to analyse those files and output a list of data citations you then submit your entry to kaggle which runs your code against a hidden set of pdfs and xml files and tells you your score the best score wins prizes my place in this competiton pretty accurately reflects my skills and ability issues with the competition unfortunately the competition itself has been how shall i put this poorly run there has been virtually no engagement from datacite in their own competition despite repeated queries from the entrants to explain the often inexplicable reasoning for the scoring in the training data or why some of the pdfs are wrong or incomplete some pdfs are preprints not the actual papers and may differ in whether they cite data or not the xml comes in a variety of formats which we weren t told about some xml was gold standard jats xml as used by pubmed central others were publisher specific or the output of pdf parsers or annotation tools i ended up making my own training data https doi org 10 34740 kaggle dsv 12667298 listing what i think are the actual data citations about twice as many as are in the official training data there are some high scoring entries see the leaderboard so it looks like make data count will get somes useful tools form this competition my only concern is that these tools may be optimised to replicate the somewhat erratic and poorly described annotation process that datacite used to create the training and hidden test data rather than accuarately retrieve the actual data citations perhaps my concerns will prove unfounded or maybe the tools can be easily retrained with better data but i am somewhat baffled that such an importasnt project for which make data count have secured funding for serious prize money has been essentially left unattended by the organisers the competition runs until 3 september references page r 2024 problems with the datacite data citation corpus https doi org 10 59350 t80g1 xys37 page r 2024 the data citation corpus revisited https doi org 10 59350 wvwva v7125 written with stackedit posted by roderic page at 5 55 pm email this blogthis share to x share to facebook share to pinterest tuesday july 08 2025 how many times are dna barcoding datasets cited this note accompanies a dataset that i uploaded to zenodo https doi org 10 5281 zenodo 15824274 my goal in creating this dataset is to link data created on the barcode of life data systems to the dois for those datasets and then to link those data dois to dois for the papers if any that created those datasets and or cited them for example the paper dna barcodes enable higher taxonomic assignments in the acari young et al 2021 cites three barcode datasets ds binfl ds 5flr and ds 10flr each of these datasets has a doi of the form https doi org 10 5883 plus the ds number one reason i want to m ake these links is so that when the dataset is displayed say in my bold view app i could also show the papers that created cited the dataset providing some context to the data e g why was the data collected another reason is that once we link data to papers we can do some interesting things such as assign credit zeng et al 2020 or discover what organisations funded the work i hope to explore these topics in the future matching datasets to publications was a tedious process there are more details on the github repository i started with a google scholar search then did lots of manual filtering and cleaning most of the articles have dois and only these articles are included in the zenodo dataset which is intended as a contribution to make data count this only scratches the surface of what could be done there are many datasets that i could not find in the literature they may never have been cited i also want to retrieve links between individual dna barcodes and the papers that published them apart from context and metrics i m also interested in whether these papers might contain more detailed information about the sequences such as geographic localities in this way we could potentially enrich the bold database as part of the virtuous cycle envisioned by david schindel schindel and page 2024 references page r 2025 citations of datasets published by barcode of life data systems bold data set zenodo https doi org 10 5281 zenodo 15824274 schindel d e page r m p 2024 creating virtuous cycles for dna barcoding a case study in science innovation entrepreneurship and diplomacy dna barcoding 7 32 https doi org 10 1007 978 1 0716 3581 0_1 young m r dewaard j r hebert p d n 2021 dna barcodes enable higher taxonomic assignments in the acari scientific reports 11 1 https doi org 10 1038 s41598 021 95147 8 zeng tong longfeng wu sarah bratt and daniel e acuna assigning credit to scientific datasets using article citation networks journal of informetrics 14 no 2 1 may 2020 101013 https doi org 10 1016 j joi 2020 101013 written with stackedit posted by roderic page at 12 04 pm email this blogthis share to x share to facebook share to pinterest older posts home subscribe to posts atom about me roderic page view my complete profile pageviews from the past week popular posts guest post response to the discussion on red list assessments of east african chameleons this is guest post by angelique hjarding in response to discussion on this blog about the paper below hjarding a tolley k a b document layout analysis how to cite page r 2023 document layout analysis https doi org 10 59350 z574z dcw92 some notes to self on document layout analysi death throes of cladistics i m in the us on uk time so this is probably a bad idea to write this but the paper by malte ebach et al o cladistics where a visualising edit history of a wikipedia page quick post really should be doing something else reading jeff atwood s post mixing oil and water authorship in a wiki world lead me google earth phylogenies now for something completely different i ve been playing with google earth as a phylogeny viewer inspired by bill piel s efforts short urls short urls have been a topic of discussion recently perhaps sparked by the article url shorteners which shortening service should you use seven percent of gbif data is usable quick thoughts on hjarding et al 2014 update angelique hjarding and her co authors have responded in a guest post on iphylo the quality and fitness for use of gbif mobilised da lsids http uri linked data and bioguid the lsid discussion has flared up again on the tdwg mailing lists this discussion keeps coming around i ve touched on it here and viewing very large trees one of the striking pictures in tamara munzner et al s paper treejuxtaposer scalable tree comparison using focus context with gu hugging face autotrain how to cite page r 2024 hugging face autotrain https doi org 10 59350 7p1n4 wdv84 these are notes to myself on using hugging face my projects alec bionames biostor ispecies ozymandias twitter tweets by rdmpage blog archive 2026 5 may 1 alpha shapes and dna barcoding march 3 february 1 2025 7 november 1 august 1 july 1 june 1 may 1 april 1 february 1 2024 10 october 3 august 1 july 1 june 2 april 1 march 1 february 1 2023 13 november 1 october 1 august 2 july 3 june 1 may 1 april 3 march 1 2022 14 december 1 september 4 august 3 may 2 april 1 february 2 january 1 2021 15 december 2 november 1 october 2 august 1 july 3 june 2 may 3 april 1 2020 25 october 2 september 2 august 7 july 7 june 1 april 3 march 2 january 1 2019 10 december 2 november 1 august 1 july 1 june 1 may 2 april 1 march 1 2018 19 december 2 november 1 october 4 september 1 august 3 july 2 june 2 may 3 january 1 2017 21 december 2 november 1 october 4 september 1 ...
|