· Pietro Abruzzo(Founder of DuckSoft)
1.Introduction:
Becoming a data scientist at 11—or at any point during teenage—is not easy. Despite claims that data science can be learned without mathematics or formal foundations, this is largely misleading. Even in non-research roles, data science requires a solid understanding of the mathematical ideas that underlie the algorithms being implemented and trained. While industry positions often emphasize programming and model development over theory, mathematics remains essential for reasoning about model behavior, limitations, and failure modes. The depth of mathematical knowledge required depends strongly on the kind of data science one pursues, ranging from applied analytics to research-driven modeling. This article explores these distinctions, the learning path I followed, and the trade-offs involved in starting early. If you are interested in what it actually takes to grow into data science at a young age, read on.
2.Differentiation:
First, machine learning engineering and data science are not the same. In ML engineering, the focus is on designing and maintaining systems for training, deploying, and running models in production. In data science, the emphasis is on developing models, training procedures, and extracting insight from data. Second, data analysis and data science are not the same either. While data analysis is widely used within data science, it is better understood as a sub-discipline focused on exploratory analysis, statistics, and interpretation of data. It is generally not concerned with designing or training machine learning models. Third, machine learning researchers and data scientists are also distinct roles. ML researchers tend to operate closer to mathematics and theory, developing new algorithms, model architectures, or advanced optimization methods. In industry, data scientists and ML engineers often work with existing models, adapting and optimizing them to fit specific applications and constraints. As mentioned on the About page, my work does not fall cleanly into a single category. I combine elements of research and applied data science, focusing on turning complex ideas into real-world systems. Exploring this hybrid role in detail would be too specific for a general audience, so in this article I will address both perspectives.
3.Data science roadmap:
While I won’t discuss specific concepts with you here, for that I would need to write a book. I will provide you with a detailed roadmap which I partially did and that I perfected based on my regrets.
3.1. Concepts needed to be learned:
3.1.1. Computer science
It is common to assume that data science requires only some mathematics and minimal programming. This is not true. In applied data science—unlike in pure research—the emphasis shifts toward programming, systems, and implementation rather than mathematics alone. As a result, a solid foundation in computer science is essential. At a minimum, this includes:
Python and SQL
Core data types and data structures
Basic algorithms (e.g., sorting, searching)
Algorithmic thinking
More advanced data structures and algorithms
Fundamental software architecture patterns
In practice, many data scientists in industry do not have deep training in computer science, especially beyond the basic-to-intermediate level. This is not necessarily a criticism; most roles do not demand full CS depth on a daily basis. However, focusing narrowly on a single skill set is rarely a path to long-term success. A working understanding of adjacent fields significantly improves problem-solving ability and system-level reasoning. While much of this computer science knowledge may not be used explicitly in every project, it becomes crucial when systems scale, fail, or behave unexpectedly. At that point, intuition alone is insufficient. Once these concepts are learned, the next step is unavoidable: practice. Mastery comes from repeated application, not passive exposure. For Python specifically, I strongly recommend the freeCodeCamp Python certification. It is well-structured, covers both syntax and foundational computer science concepts, and emphasizes practical application. As a bonus, it is entirely free. For a broader computer science foundation, Harvard’s CS50 on edX is an excellent option. While the verified certificate costs close to 300 USD, the course content itself can be audited for free. If credentials matter for your résumé, the certificate may be worth the investment; if not, the knowledge alone is already valuable.
3.1.2. Mathematics:
As mentioned earlier, mathematics is essential for understanding machine learning. The core areas are:
Linear algebra
Basic calculus
Probability
Statistics
More advanced mathematics is not used explicitly in most day-to-day data science tasks or personal projects. However, a conceptual understanding becomes important if you want to go beyond surface-level usage—especially if you aim to combine applied work with research, as I do. From a theoretical perspective, much of machine learning can be framed in terms of probability and statistics; the remaining components concern optimization and implementation. To truly understand how algorithms behave, you must be prepared to encounter mathematical expressions and derivations from time to time. The amount of mathematics required varies significantly by domain and role. For example, some finance-related positions may rely more on established models with limited mathematical novelty, whereas large-scale advertising or recommender systems often involve more complex optimization and probabilistic modeling. Neural networks provide a clear example of where mathematics becomes unavoidable. Beyond the complexity of the models themselves, optimization methods such as gradient descent play a central role. While the basic form of gradient descent relies on gradients and partial derivatives, more advanced variants are commonly used in practice. Similar optimization techniques also appear when fine-tuning models, including classical supervised learning methods such as linear and logistic regression. For a general audience, this overview captures the essential role mathematics plays in data science. Deeper exploration depends largely on your goals and the type of work you intend to pursue.
3.1.3. The rest:
The previous sections focused on skills directly tied to data science and machine learning. However, there are additional areas that are not strictly part of ML or data science, yet are still highly valuable. One example is understanding how computers work at a basic level. Knowledge of hardware, operating systems, and even elementary electronics becomes particularly relevant if you later work with robotics or embedded systems. Another important area is physics. While it may not appear directly connected to data science, physics develops strong intuition for systems, modeling, and abstraction. We interact with physical systems constantly, often without appreciating their underlying complexity. This section broadly covers general knowledge that strengthens reasoning skills across disciplines. You may not know in advance which of these skills you will need, but they often become unexpectedly useful, including in data science and machine learning.
3.2. The roadmap:
This roadmap describes not only the order in which concepts should be learned, but also how and where to practice them. Many existing data science roadmaps share a common limitation: they teach skills sequentially, without encouraging parallel development. This roadmap takes a different approach. It is designed to be more general and adaptable, allowing multiple skill sets to develop in parallel rather than isolating topics into rigid phases. The goal is not just to cover the basics of data science, but to build a more robust and transferable foundation.
3.2.1. Weekly practice layout:
As discussed earlier, practice is essential for mastery, but it must be structured effectively. If time and motivation allow, a reasonable daily schedule could look like this:
30 minutes learning new concepts or reviewing material from the previous session
A short break of a few hours
30 minutes of focused practice using exercises, books, or tools described later
30 minutes of light review in the evening, revisiting topics that were unclear or caused errors
This cycle can be repeated throughout the week. On Sundays, it is useful to consolidate everything learned during the week by working on a small one-day project that tests your understanding. Before following any structured plan, discipline is essential. Progress requires consistency, patience, and the ability to continue even when concepts are difficult. Planning your days in advance can help, though the level of structure should fit your lifestyle. Extremely rigid scheduling may be effective for some, but it is neither necessary nor realistic for everyone. For practice, I will later discuss tools and platforms in more detail. One example is Duck PA, an application I am developing to support structured practice. More information about it will be covered in a separate article.
3.2.2. Computer science:
Before implementing models or building applications, it is important to learn how to program properly. A strong computer science foundation makes later work significantly easier. One recommended starting point is Brilliant’s computer science material (this is not sponsored). Although the premium version is paid, it can be a worthwhile investment. The course CS Fundamentals provides solid coverage of algorithms and data structures, and also introduces higher-level concepts such as abstraction and pipelining. Completing one lesson per day is a reasonable pace. In parallel, the freeCodeCamp Python certification is highly recommended. It teaches Python effectively, includes practical exercises and labs, and provides a certificate that can be useful for a résumé. Certifications in general can be valuable, both for learning and for demonstrating commitment. A list of recommended certifications will be provided later. Before moving on, Harvard’s CS50 course on edX is another excellent option. While the verified certificate costs over 300 USD, the content itself can be accessed for free. After establishing a foundation in programming and computer science, it is beneficial to learn SQL and basic web development. Only then should you move on to more advanced topics and larger projects.
3.2.3. Mathematics:
The next phase focuses on mathematics. Since this article is aimed at younger learners, this section assumes the background knowledge of an average 11-year-old. Brilliant again offers high-quality courses in algebra, calculus, and related topics. A good starting point is basic algebra, followed by progressively more advanced material. However, online courses alone are not sufficient. Books and extensive problem-solving are essential for building real mathematical understanding. Regardless of which resources you choose, consistent practice is critical. The same weekly practice structure described earlier applies here as well. Mathematics, more than most subjects, rewards persistence and repetition.
3.2.4. Data Science
At this stage, I assume you have built a solid foundation in computer science and mathematics and are ready to start learning machine learning algorithms more seriously. If you are willing and able to invest money, the IBM Data Science courses on Coursera are a strong option. I have completed many of these courses and professional certificates, and they provide a good balance between theory and practice. Taken seriously, they are sufficient to build the skills needed for entry-level industry roles. At this point, it is also important to begin specializing—but not in the sense of ignoring everything else. In data science, different areas are tightly connected, and you cannot work effectively in one without understanding the others. Specialization here means placing greater emphasis on a particular area, such as supervised learning, deep learning, or applied modeling, while still maintaining a broad base. Once you choose a focus area, you can deepen your knowledge, work on targeted projects, and communicate this specialization clearly (for example, on your résumé or LinkedIn). In parallel, you must also learn data analysis and visualization, as these skills are essential for interpreting results and communicating findings.
3.2.5 Tools and Practical Skills
One area that is often overlooked is tooling. By tools, I do not mean programming languages, but rather the broader ecosystem: code editors, terminals, operating systems, and the computer itself. A highly recommended resource here is MIT’s The Missing Semester, a free course consisting of lectures and exercises focused on practical computing skills. It covers topics such as the command line, version control, debugging, and automation—skills that are indispensable in real-world work. In addition, workplace tools such as Excel (or Google Sheets), Word (or Google Docs), and PowerPoint (or Google Slides) are unavoidable in most jobs. These are workflow tools rather than technical ones, but they are still essential for collaboration and communication. Finally, language skills matter. If English is not your native language, reaching at least a B1 level is important. Much of the documentation, research, and professional communication in data science happens in English.
4. ML Research
Machine learning research varies widely depending on the field. Research on reinforcement learning for robotics, for example, is very different from research on large language models. Because of this breadth, it is not practical to cover ML research in detail here. In general, ML research builds on many of the same foundations as data science, but with a stronger emphasis on mathematics, theory, and experimentation, and slightly less focus on software engineering. Researchers often develop new algorithms, analyze their theoretical properties, or explore novel model architectures. If you enjoy working deeply with mathematical concepts, proofs, and long-term theoretical questions, ML research may be the right path. It is a role that rewards patience, rigor, and comfort with abstraction.
5. Conclusion
This roadmap is based on my own experience, my mistakes, and advice from more experienced practitioners. I do not expect anyone to follow it perfectly or to complete it quickly. Instead, it is meant to provide a realistic picture of what is required to become a strong data scientist over time. I hope this guide helps clarify the learning process and sets realistic expectations. If it encourages you to approach learning more deliberately and patiently, then it has done its job.