Advice on transitioning to Machine Learning

9 min readApr 13, 2021

These are general thoughts that I share with people transitioning into machine learning.

In this article, I will talk directly about my experience work in machine learning though many of these things could similarly apply to data science.

Patience and Practice

Be patient with the process. Doing machine learning well requires a lot of different skills, many of which are best learned by doing.

Practice, practice, practice. ML is a new skill and the best way to learn is to practice. Feed yourself positive energy as you progress and be patient with the process. Like any skill, it takes a lot of time and focus to become proficient.

The blog post Teach Yourself Programming in 10 Years captures this sentiment. Try to accept and embrace the long-term commitment to learning machine learning as a skillset. There is a lot to learn and the field is continually advancing. Framing this as a lifelong learning endeavor is exciting and gratifying, while also intimidating, so be patient. You don’t need to learn it all today, or yesterday, or tomorrow. Just relax, trust yourself, and it will come.

While there is no way of getting around practice, try to think critically about the skills you should acquire now and those that can wait until later. Find job descriptions for roles or sectors that interest you, and look at the requirements. Do you consistently see skills across multiple applications that you haven’t learned yet? Try learning those skills first.

Listen to yourself. Learn what you like to do and try to do more of that. Fully developing the skillset will take a while and making sure you enjoy the practice will allow you to persist.

A random tip from Andrew Ng: stop copying and pasting code from stack overflow or other sources; instead, type the code out yourself. You will learn little formatting details by typing things out and you will build your muscle memory.

Look into expanding the ways you learn:

Like to listen to podcasts? Find a podcast on a technology that interests you. Here is one I’m currently listening to.
Sick of looking at a screen all day? Consider reading a book on a subject, technology, or programming language. You will do a lot of reading code anyway, so reading coding in a book is valuable. Try to make reading fun. Read in a coffee shop, park, or, my favorite, in the bathtub. It’s all good practice.
Can’t find a specific book on a topic? Consider just printing out the documentation. I’ve printed out the documentation for Python Pandas and Apache Beam at one point and read through that.
What are other media you could learn from? Youtube videos? Online courses?

It will pay off, just keep practicing!

Skills

As mentioned above, look at job descriptions that interest you to help prioritize the skills you learn. Do courses in those prioritized skills and put them on your LinkedIn. You may find that recruiters in that area will start reaching out to you. A few weeks after I put Google’s IoT course on my Linkedin, someone with IoT-related jobs reached out to me.

Here are a few skills that I find are common for machine learning and/or data science roles:

Python
SQL
Git
Cloud computing
Algorithm and data structures: these won’t be listed in the job description but you may still need a basic understanding to pass coding quizzes
ML libraries like Pytorch or Tensorflow: this is more common for ML engineering roles, less common for data science roles

I had previously had the “Linux command line” in the list above, but that isn’t a commonly listed skill on job descriptions. I personally live in the linux command line and find it a really valuable skill. You may only need a basic understanding of the command line to get a job in ML.

Workera

Check out Workera. They may have a ton of job postings yet, but more importantly, the site provides really excellent feedback on your abilities in a variety of areas relevant to machine learning.

It will also give you suggestions on what specific roles you should look for like: “Software Engineer” vs “Software Engineer-Machine Learning” vs “Data Engineer” vs “Deep Learning Engineer” etc.
The feedback, quizzes, and instruction are excelling. A great resource.

The Batch

Another gem from Andrew Ng’s DeepLearning.ai: The batch is a weekly newsletter with really great ML-related content.

Interviews

Interviews with machine learning roles will often involve some combination of:

A coding quiz
Initial interview (often 30 minutes)
Takehome assignment (~5 hours over 2–3 days)
A few technical interviews
A personal or behavioral interview (sometimes framed to assess “cultural” fit)

Coding quizzes

Unfortunately, you will often have to take coding quizzes for interviews; I have failed many coding quizzes. I found coding quizzes frustrating as someone trying to break into ML because there was so much to learn on the ML side, and, THEN, you want me to know software engineering?? Give me a break!

But, sadly, there is just no getting around it. Certain job roles like “Data Scientist” may be less likely to have coding quizzes but often “Machine Learning Engineer” roles or any role with “Engineer” in it will often require you to pass a coding quiz.

There are a variety of platforms out there like HackerRank, Leet Code, and others that provide practice quizzes and instruction for coding interview.
The approach I eventually took because I had the time was to take the Stanford Algorithms specialization on Coursera. Coding quizzes are often just questions on data structures and algorithms and this is what the Coursera course covers. It’s a really great courses, and once you’ve completed it, coding interviews are much less intimidated because the whole course is basically a coding interview. And as a bonus, you will be able to start seriously thinking like a computer scientist.
Another great resource is the Cracking the Coding interview book. I used that book a moderate amount, and reading that book will definitely prepare you for coding interviews.

Getting hired

There are some companies like Tripblebyte or Workera that will match you with employers once you pass their assessment. Others like HakerRank may also be doing this.

Linkedin is your friend

I find Linkedin to be a valuable resource for hiring. Think of Linkedin as a big machine learning algorithm, so try to improve your personal feature vector to optimize your opportunities.

Set the status on your profile to “open for work”, and more recruiters will contact you with jobs because they see you are open to new roles.

Respond to recruiters, even if your real job search is a few months away. I have found that once I stop responding to recruiters, I get fewer messages. Once I respond again, I start getting more messages.

Do the quizzes in the “Skills” section like for Python, Git, ML, etc. You can only take the quizzes once every 3 months, so be somewhat thoughtful with your attempts. I find the quizzes are not super rigorous, and passing them will prioritize you for certain job applications.

Relatedly, Linkedin uses the “Skills” section to match candidates, so make sure you add all relevant skills.

Connect with the people you know and connect with new people you meet to expand your network. Having a bigger network means it is easier for more people to contact you and more people will.

Consider using writing as a tool

Considering starting a blog or medium page and write about the things you’re learning. Or, try interviewing people for your blog or for an online publication. Many established medium channels will likely take new writers. Contact them.

Writing can be a tool to explore new topics, cementing your learnings, and expand your network. I have had a blog for many years. I used to find posting public thoughts on my blog stressful, but I find very few people actually read my blog, so I don’t really need to be anxious about it! It’s more of a tool for me to help develop things I’m thinking about.

Actually doing machine learning

Doing courses is really helpful for getting a broad understanding of machine learning, though courses don’t fully prepare you for what developing machine learning models is like. That’s because the models in the course’s jupyter notebooks are all designed to run without errors. And, I know personally, how annoying and frustrating it is when a course notebook model isn’t working.

Unfortunately, most of the actual work in machine learning development, in my experience, is dealing with models that either don’t work, or don’t work well enough. Developing a production machine learning model can be fairly difficult, and this article entitled “Why is Machine Learning hard” provides some interesting perspectives on this point.

This comment on the article’s hacker news post provides an approach to machine learning that I follow. Below are my personal thoughts on the approach in that comment:

Establish a baseline you can fall back on. Try to find an existing implementation identical or similar to your problem. There is a lot of open-source models out there. You will want to jump in right away to train models, but you spending some extra time research existing models is time-well spent.
Re-create any existing results of that baseline model if that re-creation doesn’t match your use-case exactly. Re-creating results helps to solidfy your baseline by confirming that everything is (most likely) working as expected.
Try to understand and analyze your data. Plot distributions of metadata like recording count per speaker or audio duration if you’re using speech data. Review data samples yourself like looking at images listening to audio. If you can’t make sense of the data, then an ML model likely can’t either. You may then need to probably improve the data quality.
If you don’t find anything similar to what you’re doing, you will need to start from scratch. In that case start with a simpler, smaller model and gradually scale up. But, really, is there NOTHING out there similar to what you’re doing? There likely is, and try to start there. Don’t start from scratch unless you really have to.
Don’t spend too much time tuning hyperparameters initially because once you change your model, the optimal values of the hyperparameters will change. Hyperparameters usually don’t make that big of a difference, except the learning rate. I’ve spent several months beating my head against the wall because I added momentum, which resulted in my learning rate is too high and the model performance tanked.
Adding complexity gradually. Once you established a solid baseline, then you can start adding fancier ideas.

Running experiments

Given the high dimensionality outlined in the ML-hard article above, doing iterative search is often an effective (and sometimes the only) way to make progress. By iterative search, I mean using a reference model, configuration, and performance to compare against a series of experiments where you modify a single parameter. One example would be running a variety of experiments with different learning rates to tune that hyperparameter. Another example would be trying a few different approaches to feature engineering and seeing which of the approaches yield the best results.

To effectively compare the results, I have learned the hard way that you only want to modify a single parameter between different runs. If instead, you modify two parameters, you may find the model now gives you garbage performance. Which of the two variations broke the model?? You don’t know, so you now have to go back and test each parameter individually.

You can explore multiple parameters in a given set of experiments but your trial should only differ in one parameter from a reference. Say you want to understand how the learning rate (step-size, alpha) and the batch size affect performance. You have a reference run already. You can run an experiment where you vary both the step-size and the batch-size but you need to do them independently. Your trial could be:

reference: step_size: 1e-3, batch_size: 20
trial 1: step_size: 1e-4, batch_size: 20
trial 2: step_size: 1e-3, batch_size:30

You can run trial 1 and trial 2 at the same time as they both differ from the reference in only one dimension, but doing a separate experiment like:

reference: step-size: 1e-3, batch_size: 20
trial 1: step_size: 1e-4, batch_size:30

is not a good idea.

It seems super crazy and slow to only vary one parameter but it’s really the best way to do it. Your energy should then be put into how to reduce the time you spend in this experiment iteration loop by increasing training speed and finding better ways to analyze your data.