Engineering Engineering: A Playbook for High-Performance Culture

Omni Labs

Jan 15, 2025 • 10 min read

The following is an excerpt from Omni Labs CTO Tyler Tarsi's personal blog on building a culture of engineering excellence.

Over the past 3 years, I've journeyed from being an individual contributor to leading an engineering team as CTO of Omni Labs. Back when we were just two scrappy developers, we’d manually SSH into our single validator, fiddle with systemd commands, and cross our fingers for smooth network upgrades. Monitoring was non-existent. Fast forward to today, and we’re running a distributed system of distributed systems in production at scale: Omni Mainnet.

This guide distills our key learnings about building high-performance engineering teams. It weaves together insights from various sources—all unified by an obsession with rapid feedback loops, fast iteration, and user obsession. We’ve drawn from:

DORA’s research on engineering excellence
Extreme Ownership: How US Navy Seals Lead and Win by Jock Willink
MrBeast’s production principles (yes, really!)
Experiences from team members at previous startups, including scaling Luno from 10 to 1000.
And most importantly, our own successes and failures at Omni Labs

I'm sharing our playbook in hopes that others might find value in our experiences. This is a living document that continues to evolve with our team, but the core principles have brought us a long way.

A couple of us surfing at a team offsite in Cape Town. Proof that we have fun too!

What is your goal here?

We fundamentally believe that crypto is good for the world, and our mandate is to create a world class developer experience. Read our Vision, Mission, Values, and Operating Principles. Every word in that document is there for a reason. If you’re pushing towards those objectives, you’re probably on the right track.

In fact, if you’re doing that, but breaking the “rules” in this guidebook, you’re almost certainly still on the right track. Remember that any guidebook (including this one) is just that - a guide. What matters most is delivering value and improving your product. Context is that which is scarce; and every situation requires its own judgment calls.

Be an A-Player.

There are A-Players, B-Players, and C-Players. There is only room at Omni for A-Players. A-Players are obsessive, hungry, not afraid to make mistakes, learn from mistakes, coachable, don’t make excuses, believe in Omni and in crypto, and want to have an impact. B-Players are new or junior people that need to be trained into an A-Player. C-Players are just average employees. They don’t suck, they just exist and do a job, and aren’t really interested in having an impact on the world or becoming great at what they do. C-Players are not a good fit for a high performing team and should be transitioned to another company.

High-performing teams need A-Players. This doesn't mean everyone starts as an A-Player, but it means everyone has the drive and potential to become one.

Your Career

We don't do traditional "promotions" - instead, we focus on expanding ownership and impact. Make yourself invaluable by taking ownership of crucial areas and delivering results. We need more leaders in the company to grow to the scale that we want, people who make decisions, move things forward, take risk.

If you want to be a leader in some dimension – tell me or someone who is currently responsible for that thing that you want to step up. You probably aren’t ready for that today. But if you tell me that’s what you want to do, then I will tell you the areas you need to get better to become the DRI (Directly Responsible Individual) for that thing. I will give you a list of all the things you need to do to be able to own that thing here. We have high expectations, it’s not going to be easy. If you are ambitious and want to own something the only bottleneck is how quickly you can respond to feedback, learn, and prepare yourself for it. It may take a year, it make take 2 weeks.

In return for becoming so valuable to this company we hope to provide you with frontier challenges, inspiring teammates, insane personal growth, and of course financial reward.

The “Kerplunk” Principle

"Kerplunk" is our way of establishing clear ownership and accountability. Just as a coin makes a distinct sound when it drops into a vending machine, we use "kerplunk" to indicate when someone takes ownership of a task or acknowledges a request.

If you kerplunk (acknowledge ownership of something), it becomes your responsibility to complete it. Others shouldn't need to check in on your progress - this builds a high-trust environment where everyone knows things will get done once they're kerplunked.

High Agency

This is the order of types of messages I like to receive from least to most:

❌ “What should I do about this?” Because now i need to learn all the context you already have.
✅ “I want to do X, do you approve?” Gets a quick yes/no from me.
🚀 “I already did this.” Make decisions and move on with bottlenecking on me.

Read this thread from @punk6529.

As was brilliantly explained in the thread, if you find yourself wanting to go ask someone to make a decision for you, or asking someone for help in general – before raising the issue, ask yourself:

What is the decision?
What data is needed for the decision?
Do you have the data? (if not, go get it)
What decision would YOU make with that data?
Do you still need to ask for a decision or did you just make it?

The hierarchy of communication efficiency:

Solving it yourself > DM
DM > Email
Email > Zoom
Zoom > IRL Meeting

TL;DR: Just make the decision yourself OR Make a recommendation AND communicate it in one, compact, organized shot.

Definition of Done

We don’t have a strict definition of done, instead we rely on A-players to take full ownership and ship high quality production quality software.

Production quality basically comes down to “test test test in dev” and “monitor monitor monitor in prod”.
Testing comes in many forms: unit tests, smoke tests and e2e tests.
Monitoring is primarily done via prometheus and grafana (and to lesser extent logging). Some features might however not be included in the “hot path”; this introduces risk. In these cases, figure out a way to trigger this code regularly in prod/staging.

Remember, a task is not done when the PR is merged. You have to ensure it is thoroughly tested and actively monitored. Most bugs are due to a lack of testing, the impact of most production issues are mitigated by monitoring. #qualityoverquantity

Using Slack Effectively

Use slack threads. Always.

Top-level messages should be concise topic statements. Here are some taken from our slack channels today:

“Draft of minimal shareable logic with Symbiotic 🧵”
“Sync on solving 🧵”
“Request for update on Solvernet rollout timelines 🧵”
“Trying Slack AI for 1 month 🧵”

All discussion happens in threads below these topic messages with relevant notes, links, and pull requests in the comments. This approach has several benefits:

Anyone can quickly scan a channel to find relevant topics
Discussions stay organized and focused
Team members can easily opt in/out of conversations relevant to them
Historical context remains clear and searchable
Reduces noise while maintaining transparency

Additional Slack practices:

Use public channels by default - this promotes transparency and knowledge sharing
Set clear status indicators during deep work
Use reactions (like ✅) to acknowledge messages without cluttering threads
Remember that messages are temporary - use permanent storage (like notion) for important information

Trunk-Based Development

We follow trunk-based development practices, which research from DORA has shown to be a key driver of high-performing engineering teams. In trunk-based dev, developers frequently integrate their changes into a shared trunk, usually the main branch.

Key principles:

Keep branches short-lived (less than 24 hours)
Make PRs small and focused (if a PR has more than 10 comments, it's a red flag)
Merge to trunk at least daily
Use feature flags for incomplete work
Keep the build green at all times

Why this matters:

Reduces merge conflicts and integration headaches
Enables continuous integration and faster feedback loops
Makes code reviews more manageable and effective
Decreases risk by making changes smaller and more atomic
Allows for faster iteration and experimentation

Apply the Elon Algorithm

At any given production meeting for Tesla and SpaceX, Elon Musk is known for invoking “the algorithm”. It was responsible for solving the “production hell” surges in factories and has become part of every one of his companies. There are 5 commandments:

Question every requirement. Each requirement should have a name attached to it. Question requirements from smart people especially - they're the most dangerous.
Delete any part or process you can. You may need to add some back later. If you don't add back at least 10%, you didn't delete enough.
Simplify and optimize. But only after deletion. Don't optimize processes that shouldn't exist.
Accelerate cycle time. Every process can be sped up. But only after the first three steps.
Automate. This comes last for a reason. Don't automate processes that haven't been questioned, simplified, and optimized.

The algorithm was sometimes accompanied by a few corollaries:

The only rules are the ones dictated by the laws of physics. Everything else is a recommendation.
When hiring, look for the people with the right attitude. Skills can be taught. Attitude changes require a brain transplant.
A maniacal sense of urgency is our operating principle.
It’s OK to be wrong. Just don’t be confident and wrong.

Code Reviews are for Learning

While code reviews are crucial for maintaining code quality, their real power lies in knowledge sharing. They're one of the best ways to learn about different parts of the system and different approaches to problem-solving.

For reviewers:

When onboarding, review as many PRs as you can, even if they're not in your immediate area
Ask questions about approaches you don't understand
Identify patterns in the codebase
Don't just check correctness - try to understand the why behind changes

For authors:

Explain the context and reasoning in your PR description
Break changes into logical commits that tell a story (trunk based!)
Highlight interesting patterns or approaches you're introducing
Welcome questions as opportunities to share knowledge

The public nature of our Slack threads isn't just for transparency - it's a learning tool. When you see someone asking a question about the codebase, you learn. When you see someone proposing a solution, you learn. When you see someone critiquing a proposed solution, you learn.

Key practices:

Ask questions publicly - if you're wondering something, others probably are too
Never apologize for asking questions - they're a sign of learning, not weakness
Share your learning journey - document what you discover in public threads
Take time to read threads that aren't directly related to your work
If you learned something new, share it - even if it seems obvious to you

A team where nobody asks questions is either perfect (impossible) or not learning (dangerous).

Making Technical Decisions

For significant technical decisions, we follow a lightweight but effective process:

Write up initial thoughts in Slack threads and collect ideas and research
Draft an Architecture Decision Record (ADR) that outlines the proposal
Get async feedback on the ADR from the team
Hold a final synchronous design review to resolve any open questions

This approach front-loads discussion into async channels, making sync meetings short and focused. It also creates a clear record of why decisions were made, which is invaluable as the team grows.

Ruthless Prioritization

With limited capacity, ruthless prioritization is essential. This often means important things get pruned - that's okay. Being focused means saying no to good ideas to say yes to great ones.

We like to frame conversations with:

“What problem are you trying to solve?"
“How does this make the product better?”

This question should be asked frequently in engineering discussions. It helps avoid ambiguity in understanding why we're doing something and removes ego and personal preferences from discussions. Sometimes as engineers we get into the weeds very quickly, but if we can identify the problem we're solving, we often recognize that some tasks can be deprioritized to focus on higher-impact work.

If the result of this question is realizing that we can toss work out, that’s fantastic result. Move onto the next thing.

Learning from Failure

When anything goes wrong we default to having a blameless post mortem unless we have good reasons not to. The DRI of whatever component went wrong should create a doc (template here) heading into the post mortem meeting that summarizes the issue and kicks off the discussion. Post mortems are a great learning opportunity for the wider team – to understand how different components of the system fit together, to learn how we address issues with better monitoring, etc. When things go wrong, it’s a good opportunity to learn how to improve. We should always turn post mortems into concrete action items and issues – don’t let the improvements just float into the abyss.

Signal to Noise in Alerts

A crucial aspect of maintaining a high-performing system is having high-quality alerts. The principle is simple: if we get an alert, it must be important. If we get an alert that's not important, we shouldn't be getting it at all.

Key principles for alert hygiene:

If an alert fires, it should require human attention
If it doesn't require attention, it shouldn't be an alert
Every alert must be acknowledged and resolved
Get to the root cause, don’t dismiss things as “just some logs we can ignore”
Regularly review and prune or tune alert thresholds (we have a weekly meeting for this)

Less is More

In tandem with the Elon algorithm, always opt for simplicity. Use fewer tools. Do less so that you have more time to do higher leverage things. Simple solutions require less mental overhead.

Identify and Copy Patterns

Having consistent patterns in a codebase has proven effective for teams long-term. In your first few months, focus on recognizing and reusing existing patterns. This creates consistency and reduces cognitive load for the entire team.

Final Thoughts

Our team’s culture is built for high performers and having an impact – based on learnings over years in our own careers and over decades of learnings from tech companies historically. The practices in this guide illustrate principles that we care about:

Extreme ownership (Kerplunk)
Learning in public (Slack threads)
High agency (Make decisions)
Simplicity over complexity
Quality over quantity

Let’s build a great product together.

Thanks for reading! This is a snapshot in time – what really matters is that our culture is constantly evolving and solving problems. If any of these principles resonate, feel free to adopt or adapt them for your own team, and shoot me a DM if they lead to an A-ha moment.

If this culture speaks to you—and you want to help us build the future of crypto—reach out on X. We’re hiring driven individuals who thrive on personal growth, iterating often, and delivering impact for users.