How I got to Data Engineering

How I got to Data Engineering
Photo by Jukan Tateisi / Unsplash

Presented in DataMasters Episode 9 : Data Journeys Unplugged

How it started vs how it’s going

I started my career way back in September 2012 as an Application Developer in IBM. I applied to 100s of jobs through jobstreet and monster. There was no LinkedIn before so we had to apply through these job sites or go to career expos. 

Fun fact, I was not supposed to be doing Data Engineering now because I was trained by IBM to become a Java Developer. 

But the stars aligned and the Java Project got cancelled. I almost applied to other companies but IBM contacted me and told me that I got assigned to a Data Warehousing Project.I have no clue what a data warehouse is. 

Fast forward to late early-2018, I started to think about working outside of the Philippines. In the same timeframe, my housemates got the opportunity to move to Singapore and Australia. So I did what software engineers do… Copy-paste what they did. It didn’t work out. I applied to 100s of jobs in Australia and Singapore and got 0 interviews. Looking back and understanding my motivation, I realized Singapore and Australia weren't really my first choice. I just picked them because of my housemates. So I adjusted my search and applied to anything with visa sponsorship. That’s where things suddenly clicked. I got around 5 interview requests and one of them (trivago) led me to my journey in Germany.

Now, I’ve moved to a new and exciting company as a Data Platform Manager. In the sections below, I hope to share some tips and learnings from my journey.

Tip #1: Everyone starts at level 1 

Most of the time, I see a lot of those that are early in their career get overwhelmed in all the data tools out there. One of my favorite memes represents this problem

My suggested approach is to focus on the basics and building blocks of your career. In my experience, learning the 2 skills below has the highest return of investment of your time.

Teapots, cups and other pottery items on wooden shelf in pottery studio


Senior woman artist making clay bowl on pottery wheel in pottery studio.

Python


SQL

One of the most popular languages for Data jobs! (Data Science, Data Engineering)


SQL will always be there even with different technologies 

Others skills to learn: Basic Database (mysql or postgres),  Crontab, Docker        

However, a lot of people fall into the trap of stopping once they have learned a new concept through reading. As programmers, I believe it’s best to learn by doing. 

It doesn’t matter if it’s a simple hello world or tutorial to write an advanced data pipeline, you need to start somewhere and follow what Nike said. Just do it. ✔️

Time to level up. You already have mastered the basics. Now, it’s time to explore around 2-3 tools in Data Engineering. Potentially, a tool that leverages Python and SQL. This is where you could already start building an end-to-end data pipeline and try out the skills you have learned.

Suggestions:

Readings:

  

Tip #2: Be curious and learn from others

While it’s definitely doable to learn at your own pace, I’ve learned a lot of valuable lessons from others. One great example is when I was in Globe (Hello Sir Myk!), that project was to migrate from Teradata to Hadoop/Hive. Again, I had no experience on Hadoop but I had the motivation to learn. This is where one of our Data Architects who is an expert in Hadoop started teaching us. I made sure I showed motivation and wrote a lot of code to show I am willing to learn.

I remember enjoying a 2 hour bus ride from Calamba to BGC googling the terms:

  • What is Hadoop? (This is where I learned Map-Reduce concept – Highly Recommended to learn this) 
  • What is a partition? (This where I realized why file organization is important)
  • How to make my hive jobs faster? (This is where I learned about bucketing and clustering)
  • What is Parquet? (This is where I learned about columnar formats)

PS: While hadoop is still being used, there are better technologies out there that are more user-friendly and easier to maintain. 

Learning from others should not be limited to Technical Skills, one of the most valuable lessons I learned was from my manager in Globe. She said something around the lines of - “You should learn how you can help the business. Understand their actual problem than focusing on the tool or solution” 

That lesson stuck to me and to this day I always ask “why” to my stakeholders. If someone says, I want a tableau dashboard. I will ask:

  • Is there any reason why you need a new dashboard? Can we extend an existing one?
  • Is there any business problem that you are solving with this dashboard?

Suggestions:

(PS: don’t follow them blindly but use them as inspirations, big companies have a different scale and complexity of their problems)

Tip #3: Find ways to simplify complex things

Being able to break down something complex into simple and easy to follow steps are important and will be used in your career a lot of times. 

One way that I have used the most are analogies.

1. Data Engineers and Architects are actually like Engineers and Architects building your new home.

Architects create the blueprint. Engineers implement it based on the design.Data Architects design the data warehouse infrastructure and data engineers build the data pipelines. 

2. ETL is just like being in a kitchen

Extract - Get your ingredients

Transform - Clean, Chop, dice your ingredients

Load - Now put it to the bowls and ready for use

3. Think of databases/warehouse like a big library

Why is a library organized the way it is?

  • Genre like SciFi, Drama, History
  • After genre, it’s organized according to the author’s last name

A library needs to be organized and optimized for retrieving the book and for the librarian to put it back. Without this system, all books will be in chaos. Similarly, our database or warehouse needs to be the same. That’s where indexes, clustering and partitions become important.

If analogies are not your thing, then this is where writing things down could work. As you move up into your career and manage a team, you work shifts from coding to writing. You need to be able to communicate to your teammates asynchronously.

There’s a reason why a “blueprint” is important before building a house. Start by writing things.Here’s a good starting point

  • What is the current state of things?
  • What are the current problems? 
  • What are the opportunities or new requirements coming in?
  • What’s your proposed solution?
  • What things did you try out?

Suggestions:

Bonus tip #4: Be Yourself. Be a Filipino.

In one of the Data Masters episodes, Kristine said something along the lines of -“Tech skills get you in, soft skills make you stay” 

This is very true! Filipinos are known to be cheerful, hospitable and hard-working. Combine this with tech skills you learned from Data Engineering Pilipinas then this makes you very valuable as a teammate. 

People often ask me, “How do you learn soft skills?”  

My answer? Play a team-based sports or an activity that involves a group of people in synergy

  • Basketball, volleyball or even esports forces you to coordinate with your team to win.  If you lose? Talk to your teammates and analyse why you lost. What you could have done better
  • Sing or play in an orchestra or choir 
  • Act in a theater play