My journey into data

Posted on July 07, 2020 in Blog

I studied Computer Science in Ateneo and got a specialization in data science and analytics. There I learned how to code and how to manage software projects. I got a data analyst internship with RideWave, a startup that provides shuttles to and from work. I did my thesis on the behaviour patterns of twitter users and graduated. I joined the First Circle Data Team and that’s my journey. Well, I lied; it really wasn’t as straight-forward as that. The journey was far from linear.

Three years ago, if you asked me what data work was like I’d probably give you some generic answer off Google or someone else’s medium article or maybe even a blank stare. A bit more than two years ago, I wasn’t even sure if I wanted to pursue a career in data. I never would’ve imagined that I’d be in this position now talking to people about my journey into data work to people who are probably a few years my senior, the best way to do that is to tell you about the things I learned and how I learned them.

So let’s go back to what I would say is the beginning of my journey into data, I was at the end of my second year in computer science and this is usually the time where most people would finally decide if they were going to stay or part ways with the program and pursue another degree. At that point, I had already decided on staying in the program, but a question that I had to find an answer to was if I was going to specialize and what I was going to specialize in. The only program that interested me was that of the Data Science and Analytics one.

In retrospect this set off a chain of events because it opened up realms of possibilities for me; being someone who didn’t particularly enjoy software engineering or web dev which are the usual paths CS people would probably take. I began taking classes that tackled databases, computer simulations, social computing, and agent-based modelling. Meanwhile, I also began joining hackathons that centred around the use of data and experimenting on Kaggle datasets trying to figure out if data science was for me and try to learn as much as I could about it. Kaggle and the hackathons were what taught me about machine learning algorithms, or at least exposed me to what they did, and how to implement them.

This was me in class. To honest, I was definitely NOT one of the best students.

Key Takeaway: School and online courses are great for learning foundations and basics. Learning is structured and I was exposed to a bunch of different things, albeit at the surface level. School exposed me to the basics of networking, containerization, general programming practices, and software development practices. Fast forward to the beginning of my fourth year, I took an internship with RideWave, an early-stage startup that aims to provide people with a way to get to and from work every day.

Initially, we were working and housing a database on Google Sheets but this severely lacked the power that we needed. Each sheet was taking minutes to load, either that or Google Sheets itself would just conk out. So my co-intern and I started making a case for the implementation of a dedicated database, it took us a long time because there were so many questions from our manager.

I also got to start learning how to work with APIs using the Twitter API and Google geolocation API where we collected data to understand sentiment towards ride-sharing apps like Grab and Uber and tried to map them to locations to understand where we could expand to.

Key Takeaway: One key thing I learned here was that skills are more important than tools. Understanding core concepts in data storage and data processing teaches you skills that you can transfer between tools. The debate between learning R or Python is always fun but in reality, if you can achieve the same things with both tools then it’s just the same. Knowing tools is great but it’s only that until it’s put into action.

After that I went back to school, did my thesis and nearing my graduation, I began searching for possible places to work at. I applied to a bunch of different companies with varying levels of data team maturity. Some were clearly setup, while others were at its infancy stages. There were even some that were looking for their first hire into the team. I got some offers but many many more rejections or others that never even got back to me. Probably at least twice as many rejections than offers. In retrospect, many of these rejections were probably for the better. Even the rejection from my dream company, at the time. I was finding myself interviewing for companies where I wasn’t even remotely interested in what they did. Other times, I was even showing up to interviews where I wasn’t sold on the role they were offering.

Key Takeaway: The really important thing here that I want to call out is that you have to figure out what you want to do. Do you want to be a data engineer, data analyst, or data scientist and talk about it with your prospective managers, if you could. Data work is a full spectrum and it’s going to be hard to be all three at once so you have to understand what you want and see if what the company is offering is aligned with that.

At some point during my job search, I stumbled across a post on a Facebook group post talking about this data team that was hiring and the company was a startup that aims to empower SMEs by providing financial services. I took a look at the job and applied. Just to see where it could go. Talked to the team’s manager, was aligned with what they foresaw the role as, took their test, and went in for an interview.

To be honest, by the first call with the manager, I was quite sold at the team. He managed expectations quite well and was frank about what the role would entail. One thing I still remember him flagging during the interviews is that the initial work would probably be more on the engineering side, which I’m completely fine with.

I began working in First Circle in July last year and have worked on different projects that have exposed me to different sides of the business as well as different people with different backgrounds.

The first project that I worked on in the team was to build a graph database. If you asked me what a graph DB was before I did my thesis, I’d have no clue what you were talking about and the business, when we pitched it didn’t either. We could’ve framed it better to the business what it was and the possibilities that came with it. Immediate stakeholders should have been involved from the beginning and this could have brought in alignment for what we could do with the new database.

We were trying to bring in a better understanding of real-world relationships and map out network effects. Sounds cool right? It was, but one thing is business has to guide what we do and what we make. Don’t get me wrong, I still believe that the project would lead to much more amazing work to be done in the future, but had we tied it to a more specific goal that the business had, the impact would have been much more immediate.

On the other side, I worked on this much smaller piece of work that was immediately used by the business after testing. The work was just trying to implement some solution for credit limit growth and was just a few lines of code. This was because from day 1 of the work the immediate stakeholders were already heavily involved. In fact, they brought the problem up. Together, we tried to figure out what was actually happening and why. We then went on to brainstorm on possible solutions together. When the time for implementation came, the stakeholders were already aligned on what the solution needed to be.

Key Takeaway: Business impact is the priority of the project. People in the business won’t understand the possibilities unless you build it with them so you have to communicate and understand where they’re coming from. A data person should be very much involved in the business, understand what the business is and how it works.

Overall, data work or tech, maybe even life itself if we’re being philosophical, is definitely about constant learning. There will always be new tools and new novel algorithms to learn about and study, but the biggest thing that everyone needs to figure out early on is how to learn.