By Erik Gfesser in case study — Jan 12, 2021

Mentoring a Data Engineering Team while Running

A top 10 (US) grocer client contacted SPR to help them with their data lake. While we typically implement such solutions from scratch for our clients, the non-US division of this client had already put a solution in place, and the US division with which we worked was dependent on this same infrastructure.

In summary, the company was looking for the following from us:

Recommendations
Mentorship & training
Build-out of the first iterations of their data lake

One significant constraint was their need to make use of existing CDH 5.15.x (Cloudera Distribution Including Apache Hadoop) clusters already in use by the non-US division.

Additionally, not only did US data need to be kept separate from non-US data, data pipelines and DevOps pipelines also needed to be kept separate, with little visibility provided across the two. To help enable cross-team collaboration, we also needed to largely remain within the subset of Hadoop ecosystem components being used by the non-US team, with recommendations of alternative tooling, technologies, and processes kept as separate sets of deliverables.

While our client had communicated interest in recommendations, mentorship, and training, they also expressed the short term tactical need to start building out their data lake. We recommended that since the data engineering team needed working code, it made sense for us to choose use cases that addressed real business needs, tackling both at the same time.

This post is for subscribers only

Already have an account? Sign in.

This post is for subscribers only

Subscribe to Erik on Software