Meta tags:
description= Life, the Universe and R;
Headings (most frequently used words):
2011, december, january, 2012, 24, 13, november, blog, top, of, from, with, consistently, infrequent, august, 2014, 19, 29, stats, recent, posts, categories, meta, clicks, tweets, follow, via, email, word, stem, text, blocks, in, parallel, quick, scrape, grossing, films, boxofficemojo, com, web, scraping, bloggers, facebook, page, plotting, doctor, who, ratings, 1963, python, hello, world, unshorten, almost, any, url, code, optimization, one, problem, thirteen, solutions, now, sixteen, installing, rcpp, on, windows, for, and, integration, outersect, the, opposite, intersect, function, source_https, sourcing, an, script, github, over, https,
Text of the page (most frequently used words):
the (302), and (104), this (74), that (52), with (51), for (50), function (49), com (41), from (35), blog (34), which (34), data (34), http (33), file (32), facebook (28), text (27), code (26), url (25), not (25), rcpp (24), was (24), #solution (22), 2011 (21), use (21), have (20), posts (20), ratings (20), #bloggers (20), rcurl (20), using (20), return (20), click (20), into (20), comments (19), package (19), but (19), path (19), script (18), any (18), page (18), you (18), github (17), https (17), one (17), require (17), when (17), can (17), because (17), all (16), had (16), time (16), language (16), about (15), some (15), 2000 (15), load (14), true (14), only (14), would (14), there (14), out (14), str (14), num (14), gross (14), functions (13), here (13), first (13), then (13), i001 (13), title (13), frame (13), tonybreyal (12), python (12), windows (12), will (12), work (12), under (12), want (12), choose (12), www (12), list (12), top (11), source_https (11), each (11), version (11), false (11), download (11), main (11), post (11), google (11), col (11), now (10), your (10), problem (10), raw (10), reference (10), master (10), are (10), above (10), run (10), objective (10), filed (10), more (10), how (10), install (10), step (10), 004 (10), information (10), get (9), like (9), other (9), website (9), over (9), outersect (9), world (9), who (9), web (9), geturl (9), cacert (9), pem (9), lines (9), could (9), tags (9), setdiff (9), variable (9), set (9), find (9), where (9), press (9), next (9), has (9), length (9), i002 (9), walk (9), wall (9), via (8), unshorten (8), doctor (8), scrape (8), word (8), final (8), vector (8), html (8), packages (8), user (8), introduction (8), december (8), try (8), show (8), 005 (8), urls (8), going (8), date (8), xml (8), csv (8), account (7), feed (7), stem (7), security (7), read (7), make (7), just (7), having (7), vectors (7), elements (7), project (7), see (7), open (7), following (7), something (7), add (7), rtools (7), appears (7), very (7), warning (7), extra (7), location (7), part (7), etc (7), learn (7), fixed (7), 003 (7), back (7), january (7), episode (7), rbloggersfbxscraper (7), link (7), worldwide (7), cores (7), solutions (6), hello (6), parallel (6), parse (6), cainfo (6), system (6), bingsearchxscraper (6), source (6), line (6), without (6), series (6), rstats (6), input (6), simple (6), two (6), org (6), exe (6), extract (6), always (6), does (6), case (6), instead (6), off (6), method (6), years (6), comment (6), lapply (6), i003 (6), tinyurl (6), cloudst (6), gsub (6), save (6), stringsasfactors (6), domestic (6), pct (6), overseas (6), block (6), wordpress (5), log (5), consistently (5), infrequent (5), already (5), address (5), scraping (5), quick (5), films (5), boxofficemojo (5), blocks (5), eval (5), based (5), certificates (5), probably (5), unlink (5), available (5), links (5), browser (5), anyone (5), test (5), works (5), what (5), library (5), issue (5), yes (5), message (5), iii (5), left (5), window (5), navigate (5), put (5), chose (5), running (5), webpage (5), space (5), cygwin (5), x64 (5), optimisation (5), used (5), might (5), replacing (5), ggplot2 (5), however (5), character (5), strings (5), its (5), though (5), year (5), paste (5), 006 (5), single (5), myeg (5), shortened (5), been (5), these (5), programme (5), rating (5), 1979 (5), table (5), xpath (5), author (5), pages (5), api (5), studio (5), walker (5), email (4), 1963 (4), sapply (4), followlocation (4), curlssl (4), envir (4), globalenv (4), htmltotext (4), ssl (4), tmp (4), certs (4), global (4), end (4), means (4), thus (4), easier (4), follows (4), may (4), through (4), way (4), big (4), vec (4), number (4), rather (4), letters (4), both (4), after (4), different (4), doc (4), useful (4), able (4), romain (4), redmond (4), control (4), box (4), vii (4), downloaded (4), unless (4), latest (4), right (4), below (4), anyway (4), come (4), previously (4), know (4), linux (4), wiki (4), challenge (4), processing (4), also (4), better (4), string (4), much (4), sure (4), speed (4), generateindex14 (4), 001 (4), 002 (4), comes (4), need (4), things (4), programming (4), still (4), adcd (4), service (4), header (4), kind (4), dft (4), bit (4), maybe (4), should (4), really (4), course (4), being (4), plot (4), opts (4), axis (4), least (4), posted (4), subset (4), df2 (4), automatically (4), rank (4), privacy (3), follow (3), opposite (3), intersect (3), installing (3), thirteen (3), grossing (3), example (3), parameter (3), call (3), environement (3), connection (3), done (3), desired (3), setting (3), either (3), enough (3), another (3), november (3), awesome (3), nice (3), turns (3), sort (3), matching (3), quite (3), order (3), cran (3), working (3), problems (3), his (3), body (3), numeric (3), computer (3), double (3), green (3), icon (3), corner (3), pops (3), bin (3), viii (3), point (3), prompted (3), got (3), sub (3), didn (3), ignore (3), easily (3), such (3), option (3), guide (3), net (3), error (3), often (3), ubuntu (3), old (3), help (3), write (3), performance (3), fun (3), great (3), doing (3), thoughts (3), last (3), elapsed (3), replications (3), 27500 (3), generateindex13 (3), 09300 (3), generateindex12 (3), 31344 (3), generateindex11 (3), 32900 (3), timings (3), mode (3), fsep (3), unlist (3), thought (3), start (3), thing (3), love (3), names (3), never (3), than (3), twitter (3), i004 (3), most (3), don (3), decode_short_url (3), made (3), doesn (3), every (3), decode_shortened_url (3), decode (3), else (3), idea (3), full (3), shortening (3), online (3), unshort (3), tells (3), required (3), starts (3), spend (3), learning (3), users (3), changed (3), once (3), millions (3), city (3), death (3), readhtmltable (3), tables (3), 2012 (3), likes (3), commented (3), name (3), breyal (3), complete (3), 2010 (3), cookies (3), dataframe (3), barplot (3), clean (3), walks (3), walked (3), walking (3), walkers (3), stemmer (3), words (3), started (2), site (2), collapse (2), bar (2), content (2), sign (2), subscribed (2), subscribe (2), join (2), subscribers (2), create (2), new (2), tweets (2), sourcing (2), integration (2), optimization (2), sixteen (2), almost (2), plotting (2), search (2), evaluate (2), downloads (2), optional (2), best (2), avoid (2), same (2), certificate (2), curl (2), parase (2), evealuate (2), hosted (2), added (2), answer (2), environment (2), session (2), stack (2), overflow (2), sadface (2), source_github (2), approach (2), easy (2), wanted (2), wrap (2), base (2), command (2), place (2), scripts (2), manually (2), retrieve (2), duplicates (2), duplicated (2), unique (2), result (2), further (2), makes (2), think (2), equivalent (2), alternative (2), results (2), produces (2), looking (2), yellow (2), needed (2), closest (2), found (2), between (2), faq (2), correctly (2), stuff (2), inline (2), quickly (2), res (2), solved (2), taken (2), components (2), utility (2), redmondpath (2), batchfiles_0 (2), mingw (2), variables (2), fashion (2), cnet (2), annoying (2), explorer (2), zip (2), browse (2), extracted (2), folder (2), saves (2), lot (2), batchfiles (2), necessary (2), tool (2), compilation (2), section (2), english (2), screens (2), finish (2), installation (2), defaults (2), unaltered (2), them (2), sake (2), people (2), happened (2), installed (2), recommended (2), nothing (2), issues (2), x86_64 (2), w64 (2), mingw32 (2), directory (2), progra (2), 214 (2), makeconf (2), posix (2), verbose (2), created (2), pro (2), getting (2), natural (2), cool (2), faster (2), parts (2), presented (2), francois (2), spare (2), why (2), even (2), interesting (2), memory (2), efficient (2), substring (2), equal (2), larger (2), mclapply (2), compiled (2), generate (2), generateindex16 (2), generateindex15 (2), 30100 (2), requires (2), initialise (2), sprintf (2), 03d (2), seq_len (2), writing (2), sixteenth (2), seems (2), myself (2), hoping (2), together (2), generation (2), someone (2), realise (2), logged (2), limitations (2), local (2), basic (2), too (2), minutes (2), later (2), asked (2), friend (2), goo (2), replied (2), look (2), turn (2), anything (2), tinurl (2), module (2), curly (2), braces (2), type (2), def (2), print (2), pre (2), coding (2), various (2), spyder (2), ide (2), break (2), class (2), courses (2), mining (2), nlp (2), six (2), episodes (2), since (2), 2005 (2), day (2), chart (2), services (2), bbc (2), shows (2), directly (2), context (2), exercise (2), due (2), ggplot (2), aes (2), xlab (2), ylab (2), get_doctor_who_ratings (2), head (2), destiny (2), daleks (2), format_df (2), format (2), get_ratings (2), null (2), info (2), doctorwhonews (2), php (2), rbind (2), given (2), small (2), science (2), fiction (2), lunch (2), kill (2), analysis (2), button (2), future (2), sample (2), wouldn (2), find_posts (2), tony (2), embeded (2), kai (2), likeunlike (2), won (2), txt (2), sep (2), crawl (2), september (2), scraped (2), source_script (2), initial (2), original (2), instance (2), hard (2), many (2), give (2), crawling (2), wrote (2), lots (2), types (2), feeling (2), motivated (2), miss (2), interested (2), liked (2), shared (2), linked (2), theme_text (2), angle (2), box_office_mojo_top (2), avatar (2), 2782 (2), 760 (2), 2009 (2), titanic (2), 1843 (2), 1242 (2), 1997 (2), harry (2), potter (2), deathly (2), hallows (2), 1328 (2), 381 (2), 947 (2), transformers (2), dark (2), moon (2), 352 (2), 771 (2), 742 (2), 2003 (2), factor (2), get_table (2), clean_df (2), construct (2), alltime (2), graphs (2), sentences (2), coincidence (2), stem_text (2), ignor (2), coincid (2), snowballc (2), stem_string (2), wordstem (2), root (2), form (2), design, manage, subscriptions, view, reader, report, free, enter, receive, notifications, none, clicks, entries, meta, unclassified, categories, recent, 125, 940, hits, stats, older, kay, vectorised, cross, platform, update, delete, downloading, multiple, times, exists, haxx, destfile, furthermore, occurred, generic, couple, asking, produced, mighty, evaluation, contents, entire, spacedman, sourced, existed, locally, globally, rest, verifypeer, achieve, convenience, veryifypeer, fails, starting, inconvenience, cmd, internet2, around, repository, wishing, snippets, their, own, computers, repo, themselves, extend, task, development, concatenating, combinations, treat, second, asymmetric, picture, non, sections, diagram, manuals, admin, toolset, lists, forge, pipermail, devel, resources, rbenchmark, runit, check, numericvector, std, accumulate, begin, cxxfunction, signature, plugin, restart, edits, immediate, affect, edit, allow, wide, access, current, mine, alter, friendly, secure, decline, advertising, 3000, 2094_4, 10811594, long, contrib, buld, chain, compatible, rtools214, murdoch, sutherland, contain, armed, knowledge, beginning, steps, blindingly, obvious, included, completeness, similar, across, actual, itself, originally, fact, officially, containing, caused, 64bit, adding, our, choosing, 32bit, shown, program, files, lib, librcpp, dos, style, detected, preferred, cygdrive, nodosfilewarning, consult, details, paths, pathnames, compilecode, zero, errors, rccp, attractions, personally, interface, existing, libraries, purposes, provides, integrating, allows, execution, recoding, slower, providing, potential, enhancements, tutorial, leave, hand, profile, usage, funciton, compiling, bytecodes, cmpfun, compiler, positioned, actually, downside, values, plus, serial, heavy, lifting, interfacing, pretty, improvements, follwing, charts, ran, overnight, several, comparisons, 77200, positions, altogether, replace, position, outside, regular, expression, sequences, pattern, gets, generating, largest, xxx, increment, including, tenth, implemented, implementation, doubt, happen, soon, msc, statistics, fifteenth, improvement, deal, learned, reading, says, associated, tweet, hashtag, fourteenth, variety, approaches, illustrated, benefits, vectorisation, initialisation, linking, reducing, component, i005, algorithm, produce, sequence, integer, describes, bloged, hopefully, blob, sys, sleep, nobody, inherits, grep, decoded, telling, reveal, completely, forget, aid, remembering, sorts, surpsingly, contained, provided, ago, came, stackoverflow, possibly, unshortening, import, stand, alone, guess, recognised, standalone, calling, indented, belongs, important, unlike, languages, tell, ends, uses, indentation, mark, boundaries, formatting, sold, concept, yet, suppose, explicitly, normally, indent, defined, called, colon, __name__, __main__, looks, special, rstudio, highly, sites, visited, entered, pressed, printed, integrated, console, christmas, hear, modules, geekery, stanford, requisite, java, aka, short, review, sixth, trailer, embed, although, during, air, vaguely, remember, childhood, played, saying, goes, him, tennent, smith, greats, ever, sylvester, mccoy, note, misleading, suggest, popular, earlier, bear, mind, habits, dramatically, past, odd, barely, watch, live, catchup, were, fewer, channels, britain, barb, collect, prestige, embarrassment, criminally, low, budgets, wobbly, sets, flagship, iplayer, interpreted, proper, pass, caution, gap, permanent, hiatus, 1989, exception, american, 1996, geom_point, present, hadley, wickman, database, 1975, ark, pull, posixlt, while, htmlparse, xpathsapply, div, nav, href, avaiable, detail, separated, seed, weblink, constraints, optimise, british, longest, television, successful, terms, overall, broadcast, dvd, book, sales, itunes, traffic, celebrations, brain, went, park, before, decided, sorted, entry, thinking, sentiment, sampled, visable, ones, entitled, field, reply, person, improve, timestamp, wednesday, 29pm, shares, nwhich, nover, feng, chew, little, shorten, rdata, any_shorten_url, index, 2fkafechew, 2fblog, 2funshorten, 2fwednesday, 34pm, 1replytony, updated, nmy, handles, registered, admit, thursday, 03am, 1reply, rbloggers, published, subsetting, specific, depending, internet, take, saved, selecting, far, naviagate, procedure, describe, wishes, analyse, saving, finally, inspect, output, publication, merge, constructed, expressions, duncan, temple, lang, whatever, store, easiest, socialfixer, painlessly, drive, chrome, terminology, reason, well, messages, filtering, grab, filter, unfiltered, indicate, they, few, round, situation, prior, pointed, notes, prevents, those, septermeber, onwards, luckily, posting, isn, readily, limit, websites, therefore, skills, understanding, tempted, friday, weekend, logging, seeing, majority, whereby, random, returned, specifically, unsuitable, chance, surely, private, whether, graph, strict, structure, aggregator, maintained, keeping, developments, kindly, status, related, feature, articles, upon, curious, tedious, process, continually, scan, tal, galili, reorder, geom_bar, coord_flip, worldwise, usd, fox, 2021, par, 600, 1123, lord, rings, king, 1119, 377, pirates, caribbean, dead, man, chest, 1066, 423, 642, 2006, 475, obs, chr, levels, art, 1124, 1120, 601, 378, 2022, helper, pagenum, htm, column, challenges, pulls, frames, wrapper, wondering, figured, basics, feel, revisit, argument, refers, processor, worthwhile, busy, cours, busi, alway, tau, tokenise, porter, tokenize, stemed, splits, individual, stems, recombines, reduced, recently, reduce, august, 2014, life, universe,
Text of the page (random words):
er r df rbloggersfbxscraper depending on your internet connection this could take quite some time to complete because it has to crawl the r bloggers website for extra information about links posted since september 2011 to save you some time i ve saved all the data which i have scraped into a single csv file here s how to use it library rcurl csv location https raw github com tonybreyal blog reference functions master r rbloggersfbxscraper data csv txt geturl csv location cainfo system file curlssl cacert pem package rcurl df read table header true text txt sep stringsasfactors false it s then a simple case of subsetting to find posts by a specific author find_posts function df my name subset df author my name df2 find_posts df tony breyal t df2 2 30 timestamp wednesday december 14 2011 at 10 29pm num likes 6 people like this num comments at least 1 comment num shares 0 posted by r bloggers message i love these things http www r bloggers com unshorten any url with r embeded link http www r bloggers com unshorten any url with r embeded link text introduction n i was asked by a friend how to find the full final address of an url nwhich had been shortened via a shortening service e g twitter s t co n google s goo gl facebook s fb me dft ba bit ly tinyurl tr im now ly etc i replied i had no idea and maybe he should have a look nover on sample comments kai feng chew yes it s really cool i changed a little bit to make it 2 lines to use the shorten function load unshort rdata unshort any_shorten_url example http cloudst at index php do 2fkafechew 2fblog 2funshorten url function 2fwednesday december 14 2011 at 10 34pm likeunlike 1replytony breyal kai n you might want to use the code from the updated version of the code on nmy blog because it now handles both https it won t work with http 1 cloudst at myeg however because that one require the user to be registered and i ll admit i had not thought of that use case thursday december 15 2011 at 12 03am likeunlike 1reply rbloggers link http www r bloggers com unshorten any url with r title unshorten any url with r first published december 13 2011 author tony breyal blog name consistently infrequent â r blog link https tonybreyal wordpress com 2011 12 13 unshorten any url created using url shortening services decode_shortened_url tags dft ba r rcurl rstats tinurl url so this tells me that my post entitled unshorten any url with r got six likes and at least one comment on facebook nice the sample comments field shows what was commented and that i posted a reply based on that person s comment i was able to improve the code and realise that it wouldn t work with shortened link which requires a user to logged in first awesome stuff final thoughts so now i have this data i am not quite sure what to do with it i could do a sorted bar chart with each blog entry on the x axis and number of facebook likes on the y axis i was thinking of doing some sentiment analysis on the sampled comments i could only scrape visable comments not the ones you have to press a button to load more for but i don t have the time to read up on that type analysis maybe in the future r code https raw github com tonybreyal blog reference functions master r rbloggersfbxscraper rbloggersfbxscraper r csv file https raw github com tonybreyal blog reference functions master r rbloggersfbxscraper data csv comments 5 january 4 2012 plotting doctor who ratings 1963 2011 with r filed under r tags doctor who ggplot2 ratings bd 1 52 am introduction first day back to work after new year celebrations and my brain doesn t really want to think too much so i went out for lunch and had a nice walk in the park still had 15 minutes to kill before my lunch break was over and so decided to kill some time with a quick web scraping exercise in r objective download the last 49 years of british tv ratings data for the programme doctor who the longest running science fiction television show in the world and which is also the most successful science fiction series of all time in terms of its overall broadcast ratings dvd and book sales and itunes traffic and make a simple plot of it method ratings are available from doctorwhonews net as a series of page separated tables this means that we can use the rcurl and xml packages to download the first seed webpage extract the table of ratings and use xpath to get the weblink to the next page of ratings due to time constraints i m not going to optimise any of this though given the small data set it probably doesn t need optimisation anyway solution get_doctor_who_ratings function load packages require rcurl require xml return title date and rating format_df function df data frame date as posixlt df date format a d b y title df title rating as numeric gsub s 1 df rating stringsasfactors false scrape data from web get_ratings function u df list list i 1 while is null u html geturl u doc htmlparse u df list i readhtmltable doc header true which 1 stringsasfactors false u next as vector xpathsapply doc div class nav a text next href if is null u next return df list u sub info u next u i i 1 return df list main function code step 1 get tables of ratings for each page that is avaiable u http guide doctorwhonews net info php detail ratings df list get_ratings u step 2 format ratings into a single data frame df do call rbind df list df format_df df step 3 return data frame return df using the above we can pull the ratings into a single data frame as follows get ratings database ratings df get_doctor_who_ratings head ratings df date title rating 1 1979 10 20 city of death episode 4 16 1 2 1979 10 13 city of death episode 3 15 4 3 1979 09 22 destiny of the daleks episode 4 14 4 4 1979 10 06 city of death episode 2 14 1 5 1979 09 15 destiny of the daleks episode 3 13 8 6 1975 02 01 the ark in space episode 2 13 6 plot we can plot this data very easily using the hadley wickman s ggplot2 package do a raw plot require ggplot2 ggplot ratings df aes x date y rating geom_point xlab date ylab ratings millions opts title doctor who ratings 1963 present without context the gap in the data is due to the show having been put on permanent hiatus between 1989 and 2005 with the exception of the american episode in 1996 caution this was just a fun coding exercise to quickly pass some time the chart above should not be directly interpreted without the proper context as it would be very misleading to suggest that that show was more popular in earlier years than in later years bear in mind that tv habits have changed dramatically over the past 50 odd years i myself barely watch tv live any more and instead make use of catchup services like bbc iplayer which the ratings above to do not account for that there were fewer channels back in 1963 in britain the way barb collect ratings and that the prestige of the show has changed over time once an embarrassment for the bbc with all of it s criminally low budgets and wobbly sets to now being one of it s top flagship shows a final note although i was part of the generation during which doctor who was taken off the air i do vaguely remember some episodes from my childhood where the doctor was played by sylvester mccoy who to this day is still my doctor as the saying goes and i would put him right up there with tennent and smith as being one of the greats best show ever you can find a quick review of series six i e the sixth series of episodes since the show s return in 2005 right here and because i love the trailer so much i ll embed it below comments 11 december 19 2011 python hello world filed under python bd 9 57 pm introduction stanford is running a series of open online courses this january one of these courses is about text mining aka natural language processing or nlp for short there is a pre requisite in this course for being able to programme in either java or python i was going to spend my christmas break re learning c but as i really want to try this course out i m instead going to try and learn python by following this online google class because it s a language i often hear about from other r users having done the first two modules of that google course i thought i should code a quick hello world programme on my blog for the sake of geekery if nothing else objective write some python code which will print out hello world solution ubuntu linux already comes with python pre installed by the looks of it so i didn t need to do anything special i downloaded the spyder ide because it s the closest thing to rstudio which i now use when coding in r that i could see and comes highly recommended based on the various web sites i visited anyway here s the code i entered into the script window of the spyder ide to run it i pressed f5 which prompted me to save the file and after which hello world was printed to the integrated console def main print hello world if __name__ __main__ main line 1 tells us that we have defined def a function called main and it s body starts after the colon line 2 is indented to show that it belongs to main this is very important because unlike some other programming languages python does not have curly braces and which tell us where a function starts and ends but instead uses the indentation to mark the boundaries so this formatting is not optional i m not sold on this concept yet though i suppose it does save a bit on having to type in the curly braces explicitly because i would normally indent my code anyway line 4 and 5 tells us that this file lines 1 5 can be used as either a module for import into another python module or as a stand alone programme this seems to be required in every python file and so i guess i had better get used to it when i run this file it is recognised as a standalone programme and starts off by calling the main function which is used on line 5 comments 4 december 13 2011 unshorten almost any url with r filed under r tags dft ba r rcurl rstats tinurl url bd 6 57 pm introduction i was asked by a friend how to find the full final address of an url which had been shortened via a shortening service e g twitter s t co google s goo gl facebook s fb me dft ba bit ly tinyurl tr im ow ly etc i replied i had no idea and maybe he should have a look over on stackoverflow com or possibly the r help list and if that didn t turn up anything to try an online unshortening service like http unshort me two minutes later he came back with this solution from stack overflow which surpsingly to me contained an answer i had provided about 1 5 years ago this has always been my problem with programming that i learn something useful and then completely forget it i m kind of hoping that by having this blog it will aid me in remembering these sorts of things the objective i want to decode a shortened url to reveal it s full final web address the solution the basic idea is to use the geturl function from the rcurl package and telling it to retrieve the header of the webpage it s connection too and extract the url location from there decode_short_url function url packages require rcurl local functions decode function u sys sleep 0 5 x try geturl u header true nobody true followlocation false cainfo system file curlssl cacert pem package rcurl if inherits x try error length grep location s x 1 return u else return gsub location s 1 x main gc return decoded urls urls c url l vector mode list length length urls l lapply urls decode names l urls return l and here s how we use it example decode_short_url http tinyurl com adcd http www google com http tinyurl com adcd 1 http www r project org http www google com 1 http www google co uk you can always find the latest version of this function here https github com tonybreyal blog reference functions blob master r decode_shortened_url decode_shortened_url r limitations a comment on the r bloggers facebook page for this blog post made me realise that this doesn t work with every shortened url such as when you need to be logged in for a service e g http 1 cloudst at myeg decode_short_url http tinyurl com adcd http www google com http 1 cloudst at myeg http tinyurl com adcd 1 http www r project org http www google com 1 http www google co uk http 1 cloudst at myeg 1 http 1 cloudst at myeg i still don t know why this might be a useful thing to do but hopefully it s useful to someone out there comments 27 december 8 2011 code optimization one r problem thirteen solutions now sixteen filed under r tags optimisation rcpp rstats bd 1 41 pm introduction the old r wiki optimisation challenge describes a string generation problem which i have bloged about previously both here and here the objective to code the most efficient algorithm using r to produce a sequence of strings based on a single integer input e g n 4 1 i001 002 i001 003 i001 004 i002 003 i002 004 i003 004 n 5 1 i001 002 i001 003 i001 004 i001 005 i002 003 i002 004 i002 005 i003 004 9 i003 005 i004 005 n 6 1 i001 002 i001 003 i001 004 i001 005 i001 006 i002 003 i002 004 i002 005 9 i002 006 i003 004 i003 005 i003 006 i004 005 i004 006 i005 006 solutions one through thirteen a variety of different approaches are illustrated on the r wiki page which show the performance benefits of things like vectorisation variable initialisation linking through to a compiled programming language reducing a problem to its component parts etc the fourteenth solution the main speed improvement here comes from replacing the function paste by file path this use of file path with parameter fsep only works correctly here because there is never a character vector of length 0 for it to deal with i only learned about this approach when i happened to see this tweet on twitter with hashtag rstats and reading the associated help file where it says that it is faster than paste generateindex14 function n initialise vectors s mode character length n set up n unique strings s sprintf 03d seq_len n paste strings together unlist lapply 1 n 1 function i file path i s i s i 1 n fsep use names false timings test elapsed n replications generateindex14 n 27 27500 2000 50 generateindex13 n 33 09300 2000 50 generateindex12 n 35 31344 2000 50 generateindex11 n 36 32900 2000 50 the fifteenth solution rcpp this solution comes from romain francois and is based on the tenth solution but implemented in c using the r package rcpp see his blog for the implementation this is the sort of thing i would love to learn to do myself but just need to find the time to re learn c though i doubt that ll happen any time soon as i m hoping to start my msc in statistics next year this is a great solution though timings test elapsed n replications generateindex15 n 23 30100 2000 50 generateindex14 n 27 27500 2000 50 generateindex13 n 33 09300 2000 50 generateindex12 n 35 31344 2000 50 generateindex11 n 36 32900 2000 50 the sixteenth solution when i was writing up this post i thought up a sixteenth solution as seems to be the pattern with me on this blog this solution gets its speed up by generating the largest set of strings which...
|