Benchmarking — Topic Summaries
AI-powered summaries of 18 videos about Benchmarking.
18 summaries
Is Elon’s Grok 3 the new AI king?
Grok 3 is being positioned as a top-tier AI model—potentially the “AI king”—after it surged to the No. 1 spot on the LM Marina leaderboard and posted...
Only 40 lines of code
A small change in OpenJDK—switching how thread “user time” is retrieved—wiped out a long-standing 400x performance gap, cutting the cost of the...
OpenAI o3 and o3-mini—12 Days of OpenAI: Day 12
OpenAI is announcing two new reasoning models—o3 and o3-mini—positioned as a step-change in performance on coding, math, and general reasoning...
Orca: The Model Few Saw Coming
Orca, a 13 billion-parameter language model developed at Microsoft, is outperforming leading open-source chatbots on reasoning-heavy benchmarks—at...
ChatGPT o1 - In-Depth Analysis and Reaction (o1-preview)
OpenAI’s o1-preview is being treated as a step-change in reasoning performance—driven less by “more training data” and more by a new way of scaling...
Gemini 1.5 and The Biggest Night in AI
Gemini 1.5 Pro is being positioned as a step-change in long-context AI—able to retrieve and reason over information buried in massive inputs—while...
Llama 2: Full Breakdown
Meta’s Llama 2 lands as a more capable open-weight successor to Llama 1, with the biggest gains coming from a larger training run, a longer context...
o1 Pro Mode – ChatGPT Pro Full Analysis (plus o1 paper highlights)
OpenAI’s new o1 and o1 Pro mode arrive with a clear tradeoff: higher reliability on math and coding comes with mixed results on broader reasoning,...
Open Reasoning vs OpenAI
OpenAI’s “o1” reasoning models may not keep their edge for long: within roughly two to two and a half months, multiple open-weights labs released...
Learn What the Process Classification Framework (PCF) Is
The Process Classification Framework (PCF) is a hierarchical, standardized list of business processes—organized from broad categories down to...
Applying The Process Classification Framework (PCF)
APQC’s Process Classification Framework (PCF) is being used as a shared “Rosetta Stone” that lets organizations compare, organize, and govern...
Intro to APQC’s Process Classification Framework (PCF)®
APQC’s Process Classification Framework (PCF)® is a standardized taxonomy of business processes built to help organizations benchmark and map work...
How Organizations Use the Process Classification Framework (PCF)
Organizations use the Process Classification Framework (PCF) to standardize how work is described—then that shared language powers everything from...
OpenAI DevDay 2024 | Community Spotlight | Sierra
Sierra’s TAU-bench reframes how AI agents are evaluated by combining realistic user conversations with tool-using task execution—and, crucially, by...
Phi 2: Small Language Model Better Than 7B LLMs? | Google Colab Tutorial with Python
Microsoft’s Phi-2 (2.7B parameters) is positioned as a test of whether “small” language models can match the useful behavior of much larger 7B–13B...
Processes: What They Mean for Organizations
APQC’s core message is that organizations improve performance and benchmark more effectively when they define their work in terms of organizational...
Claude 3.7: Anthropic's Strategy, ChatGPT's Strategy, plus the need for real world evals
Claude 3.7’s launch is being treated as a warning sign for AI evaluation: today’s widely published benchmarks are increasingly poor proxies for real,...
Accelerating the Value From a Process Framework
A process framework only delivers real business value when it’s governed, connected to performance, and made usable across the organization—APQC’s...