DATA SCIENCE, WHERE DO I START FROM?

Image for post

Data analytics, Data mining, Big Data, Machine Learning, Data Wrangling, and all these trending terms might be confusing at first but let’s focus on understanding what Data Science means in general… Your eyes on the prize guys!

It is very important for ‘sanity-sake’ to focus on one thing at a time or at least multi-task intelligently. The good thing about Data Science is that it covers and encompasses many of these popularly used terms already mentioned.

Not to bore you with definitions but Data Science employs traditional, statistical, mathematical, and algorithmic techniques for analysis and predictions.

I used to pretty much think Data Science was competitive; I still think same mostly because of the number of people I have personally seen try to take up a career in that field. But the good thing is that the sky is large enough to accommodate all the birds of the air. By the books, there has been a 344% increase in demand for Data Scientists between 2013 to 2018. LinkedIn also reported in August 2018 a shortage of 515,717 people with Data Science Skills in the USA. The general idea is as long as there is an increase in the generation and storage of data (which there usually is), then Data Scientists will be on high demand for a long time.

I believe you’re convinced that you can be relevant in this field, so what next?!

  • Get a laptop: Laptops are a lot more comfortable to use. Plus, some software for Data science don’t have mobile versions.
  • Programming Language: Choose a programming language you intend to learn. I am limiting the options for you. Choose between Python and R; the two most popular languages in Data Science. The idea is whichever you choose; try to master it in regards to Data Science. You don’t have to learn some aspects. I say this especially when I talk about Python. Learning python in regards to Data Science has to be intentional so that you’re up and practicing in no time. Learning other things can come up while you’re practicing Data Science already. I recommend you take up online courses straight away; ones that are practical and make you code as you go. For example, Dataquest offers amazing step-wise courses in Python. So choose a course, and begin to code. Take up others until you feel comfortable with those major concepts that are relevant to Data Science. This takes us to the next step/point.
  • Databases and SQL: Learning how to extract data from databases is very important. Basically because data won’t always be given to you on a platter of gold… csv and xlsx formats I mean. SQL (Structured query language) is a popularly used for databases. Hence you should enroll for SQL courses as well. You can begin with the SQL courses from progate; very basic and allows practicals. Subsequently, you can learn from other courses and videos but try to practicalise. Learn as much as you can and ensure you feel comfortable with performing queries on databases before proceeding.
  • Tools- Programming Editors and Integrated Development Editors (IDEs): Pycharm, spyder, pydev, Sublime, Atom, etc. for python and RStudio, vim, Eclipse, Emacs etc. Any editor is okay for python but it is okay to begin with simple editors like spyder, pycharm and anything that just has a simple interface for you to code but I would highly recommend RStudio for R language. For SQL, Microsoft SQL Server Management Studio, MYSQL Workbench, Oracle SQL Developer, TablePlus, Toad for SQL server, etc. Download your selected tools and let your laptop be your gold mine.
  • Diving Deeper: Do courses as well that get you acquainted with Data Science processes.

    The idea is data science has about six steps (these vary from book to book). Framing the problem, collecting the raw data needed for your problem, preprocessing the data for analysis, exploring the data, performing in depth analysis and communicating results of the analysis. One of the steps I think takes a lot of time is preprocessing the data. The thing is that practicing this step will enable you become smoother with the entire data science process. The best way to do this is to practice with different datasets. Work on dealing with missing data, converting data to suitable formats, filtering necessary data, grouping data and so on. Another step that is important to master is “Performing in-depth analysis”. This is where you develop your model for prediction or in other words apply statistical and mathematical knowledge on the data. The thing is having knowledge of the different techniques and for what moments they are best suited for is important. But then having one technique that would be your personal technique would make you get things done faster and easier.

  • Tools- Visual Tools: Visual tools should be used after you have learned how to write codes in R or/and Python. But since they are not difficult, they can be learned simultaneously with a structured learning timetable of course (because I am very particular about not overwhelming you). Data Scientists require visual tools for the presentations of their findings. If you’re conversant with using Excel to create graphs/charts, then mastering other tools won’t be difficult. Generally, this step is pretty much my favorite because that is when I begin to feel proud of all the work I have done. Download visual tools like Tableau, Excel, Power BI, Plotly, Dash, etc. Power point is usually part of this category for me too because it helps with presentations.
  • Tools-Statistical Tools: I categories these differently and thirdly because they are very useful and makes work faster. Some statistical tools can be used without necessarily writing codes. It is important to have this but only after you are grounded with writing codes for data science processes. I simply feel is important to grasp coding for data science because some of these statistical tools are limiting. Examples of statistical tools are Rapidminer, WEKA, SPSS, Matlab, etc.
  • Internships and Trainings: There are presently virtual internships to be part of. Join one to grasp industrial and organizational concepts. Do physical internships too if you have the chance. Embrace the tasks and practicals knowing they are real life stuffs. Also, since data are made available, brainstorm and make use of data to gather beautiful insights and perform predictions and prescriptions. Do things with the data that you would be proud of. Make ground breaking discoveries or jus discoveries with the data. The idea is to practicalize as much as possible.
  • Create a portfolio: This should actually be done immediately you start taking up projects with datasets. You could create a website for that to share your projects. Your projects can also be displayed on Github. Github is popular among techbies although it is still something I am trying to grasp. I can say it is pretty much relevant because some employers ask for your github account to look at your portfolio. So take your time, learn how to use Github and share your portfolios through Github.

    Okay, I believe you can actually begin somewhere now to becoming a Data Scientist. The thing is, once you can do all the things mentioned above, you can get a job as a Data Scientist and every other thing left to learn can be learned on the job.

    Best wishes to you all!!!

References

towardsdatascience.com/my-top-5-visualizati..

kdnuggets.com/2016/03/data-science-process...

financesonline.com/20-best-sql-editor-tools