site address: d-central.tech/data/ai-inference-accelerators.csv redirected to: d-central.tech/data/ai-inference-accelerators.csv

site title:

Our opinion (on Thursday 02 July 2026 17:06:13 UTC):

- no comments

Meta tags:

Headings (most frequently used words):

-

Text of the page (most frequently used words):
npu (33), the (31), https (24), llm (21), com (18), tops (17), not (16), amd (15), 2026 (15), www (15), for (13), edge (13), tenstorrent (13), hailo (13), ryzen (12), int8 (12), models (12), nvidia (12), gpu (12), llama (11), memory (10), card (10), vision (10), max (9), unified (9), igpu (9), via (9), laptop (9), jetson (9), orin (9), core (9), local (9), accelerator (8), soc (8), only (8), apple (8), cores (7), cpp (7), sram (7), 395 (6), device (6), inference (6), ollama (6), 300 (6), series (6), cpu (6), small (6), software (6), hardware (6), open (6), source (5), xdna (5), runs (5), sdk (5), openai (5), html (5), large (5), llms (5), arm (5), good (5), qualcomm (5), intel (5), openvino (5), with (5), blackhole (5), google (5), tpu (5), groq (5), strix (4), halo (4), lpddr5x (4), 70b (4), single (4), path (4), lemonade (4), compatible (4), products (4), emerging (4), envelope (4), and (4), agx (4), cuda (4), stack (4), snapdragon (4), elite (4), genai (4), model (4), docs (4), tensix (4), dram (4), wormhole (4), gen (4), 10h (4), coral (4), lpu (4), power (3), zen (3), 128 (3), pool (3), general (3), studio (3), mini (3), class (3), pro (3), dual (3), capacity (3), ampere (3), full (3), language (3), nano (3), limited (3), hexagon (3), engine (3), efficient (3), lunar (3), lake (3), pcie (3), risc (3), vllm (3), metal (3), 576 (3), neural (3), interface (3), api (3), weights (3), hold (3), chip (3), datacenter (3), database (3), ane (3), plus (2), rdna (2), excellent (2), used (2), paths (2), server (2), 120 (2), box (2), host (2), pcs (2), dev (2), platform (2), 2025 (2), point (2), x86 (2), system (2), 256 (2), bit (2), part (2), ram (2), onnx (2), same (2), story (2), technical (2), product (2), 64gb (2), lpddr5 (2), 13b (2), quantized (2), tensorrt (2), mlc (2), credit (2), module (2), content (2), review (2), package (2), super (2), ecosystem (2), capable (2), int4 (2), run (2), ultra (2), throughput (2), generative (2), p150a (2), gddr6 (2), backed (2), built (2), metalium (2), cards (2), aibs (2), specifications (2), n300d (2), asic (2), direct (2), ddr (2), die (2), cap (2), raspberry (2), hat (2), dataflow (2), compiler (2), hailort (2), vendor (2), but (2), accelerators (2), suitable (2), cnn (2), cannot (2), designed (2), tensorflow (2), lite (2), groqcard (2), architecture (2), 230 (2), lpus (2), cloud (2), groqcloud (2), rack (2), scale (2), silicon (2), pointer (2), here (2), mlx (2), sibling (2), dataset (2), 128gb (2), data (2), manufacturer, type, compute, memory_model, local_llm_suitability, runtime_support, notable, last_verified, apu, cus, 8000, assignable, vram, variable, graphics, note, largely, unused, mid, actual, specific, accelerated, vulkan, rocm, configurable, standout, consumer, shipping, gmktec, evo, beelink, gtr9, processors, blogs, processor, breakthrough, shared, channel, typically, wide, phi, practical, mainstream, tier, narrower, than, runtime, distinguish, from, far, less, bandwidth, much, weaker, guru3d, overview, pages, dla, 2048, tensor, 275, sparse, 170, dense, fp32, tflops, 204, strong, unquantized, larger, differentiator, over, smaller, jetsons, presets, maxn, sized, popular, self, hosted, always, robotics, node, dam, solutions, gtcf21, brief, pdf, storagereview, revisiting, tiny, 8gb, also, variant, ceiling, budget, entry, cheapest, tinkering, autonomous, machines, embedded, systems, oryon, armv9, adreno, commonly, 8533, optimized, fp16, bf16, maturity, gating, factor, maturing, qnn, very, fraction, windows, community, testing, confirms, usable, laptops, tweaktown, news, 97822, xda, developers, these, locally, surprisingly, series2, xe2, arc, combined, constraint, moderate, nf4, supported, tok, combining, draws, best, efficiency, peak, prompts, 1024, tokens, may, need, plugin, windowsml, directml, webnn, strongest, case, battery, assistant, loops, workflow, support, articles, 000099574, thurrott, 303493, 140, big, 210, 512, purpose, qwen, mistral, mixtral, falcon, fork, qsfp, ports, allow, multi, pooling, access, down, active, cooled, 399, fully, aligns, decentralization, sovereignty, narrative, credible, non, asics, 192, prior, 449, predecessor, iterative, lineage, 2nd, lpddr4, first, that, can, vlms, diffusion, lifts, tested, typical, genuinely, cnx, all, external, hard, size, networks, included, correct, common, misconception, pipelines, detection, segmentation, classification, low, digit, market, leading, above, successor, usb, board, coprocessor, convolutional, era, mobilenet, 400, fps, never, frequently, mis, asked, about, answer, newer, coralboard, transformer, separate, distinct, edgetpu, benchmarks, blackscarab, insights, guide, processing, unit, deterministic, even, needed, serve, accessed, token, listed, clarify, home, endpoint, service, 20k, useless, isolation, belongs, economics, cryptoslate, 20000, breaks, performance, records, see, row, duplicated, reference, standard, tasks, decode, specs, live, avoid, duplication, desktop, cross, link, rows, already, present, purely, disambiguate, keep, two, datasets, explicitly, connected, central, tech, internal,

Text of the page (random words):
id accelerator manufacturer type compute memory_model local_llm_suitability runtime_support power notable source last_verified amd ryzen ai max plus 395 amd ryzen ai max 395 strix halo amd unified memory apu soc 16 zen 5 cores 40 rdna 3 5 igpu cus xdna 2 npu 50 tops int8 up to 128 gb lpddr5x 8000 unified up to 96 gb assignable as vram amd variable graphics memory excellent runs llama 70b q8 on a single device via the igpu unified memory pool note the 50 tops npu is largely unused for general llm inference as of mid 2026 the igpu is the actual llm path npu used only for specific accelerated paths via lemonade sdk llama cpp vulkan rocm lm studio ollama amd lemonade sdk openai compatible via ollama lm studio server configurable 45 120 w mini pc laptop standout consumer single box 70b class llm host shipping in mini pcs gmktec evo x2 beelink gtr9 pro amd ryzen ai halo dev platform https www amd com en products processors laptop ryzen ai 300 series amd ryzen ai max plus 395 html https www amd com en blogs 2025 amd ryzen ai max 395 processor breakthrough ai html 2026 06 amd ryzen ai 300 strix point npu amd ryzen ai 300 strix point xdna 2 npu amd npu in x86 laptop soc xdna 2 npu 50 tops int8 zen 5 cpu rdna 3 5 igpu shared system lpddr5x dual channel typically 16 32 gb not the 256 bit wide pool of the max halo part emerging small models llama 3 1 8b phi 3 5 mini via ryzen ai software lemonade on the npu igpu ram is the practical llm path mainstream laptop tier narrower memory than strix halo amd ryzen ai software lemonade sdk onnx runtime llama cpp on igpu 15 54 w laptop envelope distinguish from max 395 same npu class far less memory bandwidth capacity much weaker for large llms https www guru3d com story ryzen ai 395 technical overview zen 5 cores and xdna 2 ai npu xdna 2 npu tops amd ryzen ai 300 series product pages 2026 06 nvidia jetson agx orin 64gb nvidia jetson agx orin 64 gb nvidia edge soc ampere gpu arm cpu dual dla 2048 core ampere gpu 64 tensor cores 12 core arm up to 275 tops sparse int8 170 dense int8 tops 5 3 fp32 tflops 64 gb 256 bit lpddr5 unified 204 8 gb s strong runs 13b unquantized and larger quantized models via cuda large unified pool is the differentiator over smaller jetsons llama cpp ollama nvidia tensorrt llm mlc llm full cuda stack 15 w 30 w 50 w presets up to 60 w maxn credit card sized edge module popular self hosted always on llm robotics node https www nvidia com content dam en zz solutions gtcf21 jetson orin nvidia jetson agx orin technical brief pdf https www storagereview com review revisiting the nvidia jetson agx orin tiny package large language models 2026 06 nvidia jetson orin nano super 8gb nvidia jetson orin nano super 8 gb nvidia edge soc ampere gpu arm cpu up to 67 tops int8 8 gb lpddr5 unified also 4 gb variant limited small 3b 8b quantized models only 8 gb ceiling budget entry to the jetson cuda llm ecosystem llama cpp ollama mlc llm tensorrt llm 7 25 w cheapest cuda capable on device llm box good for tinkering not large models https www nvidia com en us autonomous machines embedded systems jetson orin orin nano series up to 67 tops 7 25 w 4 8 gb 2026 06 qualcomm snapdragon x elite hexagon npu qualcomm snapdragon x elite hexagon npu qualcomm npu in arm laptop soc hexagon npu 45 tops int8 12 core oryon armv9 cpu adreno gpu unified lpddr5x up to 64 gb commonly 16 gb at 8533 mt s good emerging runs 8b 13b class models on device npu optimized for int4 int8 not fp16 bf16 software maturity is the gating factor llama cpp ollama cpu npu paths maturing qualcomm ai engine qnn very efficient laptop envelope npu is fraction of igpu power windows on arm ai pcs community testing confirms usable power efficient local llms on the npu https www qualcomm com laptops products snapdragon x elite https www tweaktown com news 97822 45 tops npu https www xda developers com these llms run locally snapdragon x elite npu surprisingly good 2026 06 intel core ultra series2 lunar lake npu intel core ultra series 2 lunar lake npu intel npu npu 4 in x86 soc npu up to 48 tops int8 p e cores xe2 arc igpu combined platform 120 tops on package lpddr5x 16 gb or 32 gb on lunar lake capacity is the constraint emerging moderate 7b 8b models via openvino genai on npu nf4 supported 8 tok s combining cpu npu gpu npu draws 2 3 w vs 15 25 w igpu best for efficiency not peak throughput prompts 1024 tokens with 7b models may need 16 gb ram intel openvino openvino genai npu plugin windowsml directml onnx rt webnn laptop envelope npu 2 3 w strongest case is battery efficient assistant loops not large model throughput https docs openvino ai 2025 openvino workflow generative inference with genai inference with genai on npu html https www intel com content www us en support articles 000099574 https www thurrott com hardware 303493 2026 06 tenstorrent blackhole p150a tenstorrent blackhole p150a tenstorrent ai accelerator pcie card risc v tensix 140 tensix cores 16 big risc v cores 210 mb sram 32 gb gddr6 512 gb s dram backed good purpose built local llm inference card runs llama qwen mistral mixtral falcon via tenstorrent s open source vllm fork qsfp dd ports allow multi card memory pooling tenstorrent tt metalium tt vllm open source openai compatible server access down to the metal up to 300 w active cooled 1 399 fully open source software stack aligns with the decentralization sovereignty narrative a credible non nvidia local llm card https tenstorrent com en hardware cards https docs tenstorrent com aibs blackhole specifications html 2026 06 tenstorrent wormhole n300d tenstorrent wormhole n300d tenstorrent ai accelerator pcie card risc v tensix dual asic 2x wormhole asics 128 tensix cores 192 mb sram 24 gb gddr6 576 gb s dram backed good prior gen open stack local llm card same tt software ecosystem as blackhole tt metalium tt vllm open source openai compatible up to 300 w 1 449 predecessor to blackhole credit tenstorrent s iterative open hardware lineage https docs tenstorrent com aibs wormhole specifications html https tenstorrent com en hardware cards 2026 06 hailo 10h hailo 10h hailo edge generative ai accelerator m 2 40 tops int4 20 tops int8 2nd gen neural core direct ddr interface to on module lpddr4 4x 4 gb or 8 gb limited emerging first hailo part that can run small llms vlms diffusion at the edge the direct ddr interface lifts the on die sram cap capacity limited to small models tested on raspberry pi ai hat 2 hailo dataflow compiler hailort sdk vendor stack 2 5 w typical genuinely runs gen ai at the edge but is a small model vendor sdk device not a general openai api llm host https hailo ai products ai accelerators hailo 10h ai accelerator https www cnx software com 2026 01 20 raspberry pi ai hat 2 review 2026 06 hailo 8 hailo 8 hailo edge vision accelerator m 2 pcie up to 26 tops int8 all weights on die sram no external memory interface hard cap on model size not suitable vision cnn only no dram path cannot hold llm weights designed for vision networks included to correct a common misconception hailort vision pipelines detection segmentation classification low single digit w market leading edge vision accelerator not an llm device the hailo 10h above is the gen ai successor https hailo ai products ai accelerators hailo 8 ai accelerator 2026 06 google coral edge tpu google coral edge tpu usb m 2 dev board google edge tpu vision coprocessor 4 tops int8 2 tops w 8 mb on chip sram tensorflow lite int8 models only not suitable vision cnn only built for the convolutional vision era e g mobilenet v2 400 fps never designed for language models no memory to hold llm weights tensorflow lite edge tpu compiler vision models 2 w frequently mis asked about for llms the answer is no google s newer coralboard with a transformer capable npu is a separate distinct product https www coral ai docs edgetpu benchmarks https www blackscarab ai insights google coral edge tpu guide 2026 06 groq lpu groqcard groq lpu groqcard groq datacenter inference asic language processing unit deterministic dataflow architecture 230 mb on chip sram 80 tb s no dram a single chip cannot hold even a small model 576 lpus needed to serve llama 2 70b not local cloud datacenter only accessed via groqcloud token api rack scale only listed to clarify it is not home local hardware groqcloud api openai compatible endpoint service not local device datacenter card rack scale 20k card and useless in isolation belongs to the cloud inference economics story not local hardware https groq com lpu architecture https cryptoslate com groq 20000 lpu card breaks ai performance records 230 mb sram 576 lpus for 70b 2026 06 apple silicon m series pointer apple silicon m series m3 m4 max pro see ai gpu database apple unified memory soc pointer row not duplicated here up to 40 core gpu 16 core apple neural engine ane up to 128 gb unified memory the reference standard for local llm on a soc excellent but the llm path is the gpu via metal mlx llama cpp not the apple neural engine ane is used for core ml vision system tasks not general llm decode full specs live in the sibling gpu dataset to avoid duplication mlx llama cpp metal ollama lm studio laptop desktop envelope cross link only full rows apple m4 max 128gb apple m3 max 128gb apple m4 pro 64gb already in data ai gpu database present here purely to disambiguate ane vs gpu and keep the two datasets explicitly connected https d central tech data ai gpu database internal sibling dataset 2026 06

Thumbnail images (randomly selected): * Images may be subject to copyright.

No Images

site address: d-central.tech/data/ai-inference-accelerators.csv redirected to: d-central.tech/data/ai-inference-accelerators.csv

site title:

-

Header

Load Info