Meta tags:
description= ;
author= Blaze Developers;
Headings (most frequently used words):
by, and, matthew, rocklin, distributed, dask, with, blaze, analyzing, reddit, comments, the, ecosystem, arrays, recent, blog, posts, talks, tutorials, introducing, array, experiment, pydata, on, hdfs, without, java, ad, hoc, computation, pipelines, reuse, billion, impala, castra, daniel, rodriguez, kristopher, overholt, jim, crist,
Text of the page (most frequently used words):
and (18), data (15), dask (12), #distributed (11), blaze (10), 2015 (9), the (8), with (7), read (7), pandas (6), ecosystem (5), interface (5), systems (5), matthew (5), rocklin (5), reddit (4), python (4), query (4), storage (4), numpy (3), larger (3), your (3), for (3), like (3), some (3), other (3), hive (3), impala (3), different (3), computing (3), wed (3), use (3), october (3), hdfs (3), crist (2), core (2), through (2), task (2), scheduling (2), parallel (2), than (2), process (2), talks (2), castra (2), great (2), doing (2), analysis (2), computations (2), one (2), into (2), run (2), this (2), workflow (2), that (2), spark (2), are (2), small (2), analyzing (2), comments (2), tue (2), september (2), library (2), databases (2), users (2), such (2), csv (2), pipelines (2), hoc (2), snakebite (2), cluster (2), array (2), arrays (2), blog (2), projects (2), more, james, scipy, out, blake, griffith, pygotham, going, memory, graphs, christine, doig, europython, scale, not, welcome, tutorials, scientific, packages, provide, excellent, complicated, datasets, only, few, lines, code, can, load, dataframe, generate, plot, results, however, starts, falter, when, working, ram, computer, point, people, often, move, their, from, based, system, hadoop, these, what, they, but, problems, bit, overkill, jim, works, translating, subset, modified, syntax, gives, familiar, living, sql, nosql, stores, raw, files, json, hdf5, daniel, rodriguez, kristopher, overholt, billion, dasklearn, sklearn, accelerate, parameter, searches, over, machine, learning, naming, consistently, reuse, mon, concurrent, futures, computation, pydata, without, java, ec2, experiment, fri, november, analyze, github, using, introducing, february, 2016, recent, posts, migration, between, odo, dynamic, multidimensional, dynd, description, language, datashape, blocked, algorithms, set, libraries, help, store, describe, composed, following, sponsored, overview, home,
Text of the page (random words):
the blaze ecosystem home overview projects talks blog sponsored by the blaze ecosystem the blaze ecosystem is a set of libraries that help users store describe query and process data it is composed of the following core projects blaze an interface to query data on different storage systems dask parallel computing through task scheduling and blocked algorithms datashape a data description language dynd a c library for dynamic multidimensional arrays odo data migration between different storage systems recent blog posts wed 17 february 2016 introducing dask distributed by matthew rocklin we analyze github data on a cluster using dask read dask distributed computing fri 13 november 2015 distributed array experiment by matthew rocklin distributed arrays we use dask array a small cluster on ec2 and distributed read dask distributed wed 28 october 2015 pydata on hdfs without java by matthew rocklin we use snakebite and distributed to run pandas on csv data in hdfs read hdfs snakebite distributed pandas tue 27 october 2015 ad hoc distributed computation by matthew rocklin ad hoc distributed computations with a concurrent futures interface read distributed mon 19 october 2015 pipelines and reuse with dask by matthew rocklin tl dr we use dask to accelerate parameter searches over machine learning pipelines by naming consistently read dask sklearn dasklearn wed 16 september 2015 analyzing 1 7 billion reddit comments with blaze and impala by daniel rodriguez and kristopher overholt blaze is a python library and interface to query data on different storage systems blaze works by translating a subset of modified numpy and pandas like syntax to databases and other computing systems blaze gives python users a familiar interface to query data living in other data storage systems such as sql databases nosql data stores spark hive impala and raw data files such as csv json and hdf5 hive read blaze impala hive reddit tue 08 september 2015 analyzing reddit comments with dask and castra by jim crist the scientific python ecosystem is great for doing data analysis packages like numpy and pandas provide an excellent interface to doing complicated computations on datasets with only a few lines of code one can load some data into a pandas dataframe run some analysis and generate a plot of the results however this workflow starts to falter when working with data that s larger than the ram on your computer at this point people often move their workflow from a python based one into some other larger system like spark or hadoop these are great at what they do but for small problems are a bit overkill read dask castra reddit talks and tutorials scale your data not your process welcome to the blaze ecosystem europython 2015 christine doig going parallel and larger than memory with graphs pygotham 2015 blake griffith dask out of core numpy and pandas through task scheduling scipy 2015 james crist more
|