By Alexy Khrabrov — May 17, 2026

At the Gemma 4 Launch, Under the Pyramid

On April 17, I went to the Gemma 4 launch: SF Edition at the Transamerica Pyramid in San Francisco. It was one of those Bay Area AI evenings where the setting does half the storytelling before anyone says a word: the sharp geometry of the Pyramid above us, the city outside, and inside, a room full of people building what the next turn of open models will make possible.

Gemma 4 Launch at the Transamerica Pyramid

I shot the event as I usually do, moving between the formal moments and the in-between ones: people arriving, conversations forming in small clusters, speakers getting ready, the audience leaning in. Launch events are nominally about a model, a product, or an announcement, but the real story is usually the community gathered around it. That was true here. Gemma 4 was the occasion; the room was the signal.

The evening was MC'ed by Paige Bailey, whom I saw again in line for registration, after me. She patiently waited for her turn. It was a small thing, but also very much the texture of the event: senior people, builders, researchers, product leads, and community folks all moving through the same line, heading into the same room, ready to compare notes.

Audience and conversations at the Gemma 4 launch

Olivier Lacombe, the Product Manager for Gemma, opened with an overview of what Gemma 4 is trying to be: a family of open models built from the same research lineage as Google's frontier Gemini work, but packaged for developers who want to run, adapt, and deploy models in their own environments. The technical story was not just "bigger model, better scores." It was about intelligence per parameter, long-context use, multimodality, reasoning, agentic workflows, and a cleaner open-model story under Apache 2.0.

What stood out to me was the continuity between research, developer tooling, and the open source ecosystem. Gemma has always mattered because it sits in that practical middle ground: serious enough for builders, open enough to invite experimentation, and accessible enough that students, startups, researchers, and independent developers can actually try things without waiting for permission.

Gemma 4 felt like part of a broader pattern I keep seeing across AI communities: the center of gravity is moving from demos to workflows. People no longer ask only whether a model is impressive in isolation. They ask where it can run, how it can be adapted, how it behaves inside real applications, what it costs, and whether developers can trust it enough to build with it.

That is where the vLLM part of the evening was especially useful. The vLLM folks described how they launched Gemma on vLLM, and talked through the architecture and how they decompose it inside the vLLM runtime. This is the kind of detail that matters for practitioners: not just that a model exists, but how it maps onto serving infrastructure, memory, batching, attention, scheduling, and the practical mechanics of making inference fast enough and reliable enough to use.

Technically, the Gemma 4 story is about a few converging innovations. The family spans small edge-oriented models and larger workstation/cloud models, with long context windows and native multimodal support for text, images, and video. The models are aimed at more advanced reasoning and agentic workflows, where the system has to plan, call tools, and carry state across longer tasks. The larger variants include mixture-of-experts style efficiency, while the edge models are designed to make serious local inference plausible on developer hardware and mobile-class devices. And because the release is Apache 2.0, the deployment story becomes much simpler for companies and open source projects that need clear reuse rights.

The Transamerica Pyramid was a fitting place for that conversation. It is one of San Francisco's most recognizable structures, a building from an earlier era of ambition and abstraction. Standing there for an AI launch, it was hard not to feel the strange layering of the city: finance, software, open source, research labs, startups, meetups, old infrastructure, new infrastructure.

I also kept thinking about the social role of open models. The point is not just that more people can download weights. The point is that more people can inspect, adapt, teach, fine-tune, benchmark, deploy, criticize, and improve the systems that are becoming part of everyday computing. Open models create more edges where communities can attach themselves.

That is why I like photographing these events. Photos catch the human layer around technical work: the raised hand, the hallway exchange, the demo watched over someone's shoulder, the quick introduction that turns into a project six months later. The technology matters, obviously. But the community is how the technology becomes real.

For me, the Gemma 4 launch was a reminder that open AI is not an abstract policy position. It is a practice. It happens when people show up, compare notes, ask hard questions, and take the tools back into their own labs, companies, classrooms, and side projects.

I left the Pyramid thinking that this is still the most interesting part of the Bay Area AI scene: not the hype cycle, but the density of people who are willing to build in public, learn in public, and argue about the future before it is settled.

More photos from the event are in my gallery: Gemma 4 Launch.

AgStack