Jul 16, 2018

Anatomy and mindset of the data army at Gojek

At Gojek, we work every day to improve people’s lives. We are building a hyper-local network that helps millions of people commute, shop, eat and pay, daily. This cornerstone has helped us evolve from a motorcycle ride-hailing service into an on-demand mobile platform, providing a range of services that include transportation, logistics, food delivery, and payments.

A total of 1,000,000 driver-partners collectively cover an average distance of 16.5 million kilometers each day, making Gojek Southeast Asia’s de facto transportation partner. Gojek is a verb. Gojek is a way of life.

On this mission, products fueled by data and machine learning can be a powerful way to solve users’ needs. The Data Engineering team is responsible for creating a reliable data platform to power analytical and ML solutions and make data accessible to everyone at Gojek.

Who we are

We are a team of full-stack, opinionated, polyglot engineers and passionate human beings who believe the core of a developer’s job is not just writing code. We own the entire product cycle and are responsible for the adoption and experience end to end. Having product ownership gives everyone a responsibility to put their best and gives meaning to our daily work.

Why we're here

The Data Platform team is dedicated to the empowerment of Gojek. It aims to provide disproportionately large advantages to Gojek over its competitors by making data available, accessible, reliable, and actionable at scale. The data platform allows internal teams to build workflows on our products to develop innovative solutions through in-depth collaboration and cross-pollination, creating new opportunities for our customers and expanding on what’s possible. As a team, we want to make an immense impact — shaping the data with which the company makes innumerable business decisions.

What we do

As data engineers, we solve problems with data to improve the product that we offer to our users. We build tools, infrastructure, frameworks, and services and, on the way, develop new skills, new ways of doing things, new tools, and turn our backs to traditional methods. At Gojek, data engineering is much closer to software engineering. We understand the tradeoffs between different open source Big Data technologies and aren’t afraid to build it ourselves if needed.

How we do it

Some informal philosophies and approaches we follow during our time working on data platform:

Product thinking

We operate as an internal product organization. We measure success with business metrics like user adoption, retention, revenue, or cost savings generated per feature. Our customers are Product Managers, Developers, Data Scientists, and Analysts. Following are significant areas we focus on when building our products.

Customer Focussed

We as a team ensure all facets of the platform prioritize customer satisfaction as the primary concern. Our team culture is dedicated to understanding users, improving experiences, and building solid relationships by increasing lifetime value for our customers.

Self Service

We enable platform users to directly interact with data instead of relying on data engineers for their day-to-day needs. We aim to make the entire process self-service while ensuring the security and privacy of data are handled carefully.

Build frameworks, not pipelines

We believe teams know their data and business best, and following traditional practices where data engineers write pipelines will not scale and create a bottleneck for the teams. We empower users to create their data workflows with ease by building reliable frameworks as part of the data platform.


Second important piller for data platform team culture is efficiency. Efficiency for us is to figure out ways to optimize performance, increase work pace, and improve job satisfaction.


Working at such a large scale makes it necessary to automate everything from deployment to infrastructure. This way, we can push features faster without causing chaos and disruption to the production environment.


We know our users have specific needs, and no one solution fits all. We build modular components that allow users to build their data workflows as per their needs.


Data platform products are built with cost considerations at the core. We empower our users to have complete visibility and optimize their spending on data workflows.


We do not believe in shipping junk to our users. As a platform team, our product adoption relies on the trust our customers have.

Observability is part of the product.

Visibility about how our systems are doing is not a nice to have. It is a crucial part of the product. Data products should allow anyone to check the quality and performance at a given time.

Idempotent data workflows

Idempotence is a property of an operation with no additional effects if the operation is applied more than once on the same input. By making data workflows idempotent, we increase the ability to safely reproduce the jobs, making the development cycle faster and more efficient.


Data at Gojek doesn’t grow linearly with the business, but exponentially, as people start building new products and logging new activities on top of their growth. We currently see 6+ Billion events daily and rising.

Where we're going

We want to make Gojek a data-first company by fostering collaboration between product and business leaders and data teams. We strive to lead the business through data-driven decisions and build the world’s best data platform. We aim to federate data ownership among domain data owners, provide their data as products to solve core user needs, fuel the business, and create a lasting competitive advantage.