Introduction to Understanding probability for Data Science

Probability is an area in math. It is the chance of an event happening. It is represented as a percentage, fraction or decimal and can be applied to diverse areas and practically too. Probability isn’t one of those math concepts you learn and wonder where you’d apply it to. As a matter of fact, intelligent people make decisions daily by checking probabilities directly or indirectly. For example, you could find out what the probability of meeting your friend at home will be if you decide to pay him a surprise visit. In trying to reason that out, you might want to consider several things like… what day of the week it is, what time you are going to visit him? How many times you have met him at home when you tried giving him surprise visits in the past, if he has a roommate that he hates to be with or if he spends some of his time in a hotel (weird questions but it depends on the person you’re dealing with). The thing is all these things can guide you to finding the probability of meeting him at home thus helping you decide whether to go visit or do it some other day.

All these points considered leads us to the different types of probability; Theoretical and Experimental (Empirical) probability.

Experimental Probability

In determining experimental probability, the details of the situation won’t matter that much basically because experimental probabilities are not usually expected but are simply gotten via experiments. For example, in experimental probability, the probability of getting a head in 20 tosses of a fair coin can be 6/20 (all depending on the results gotten from the actual experiments).

The Experimental probability formula of an event E is given below:

P(E)=Number of times even E occurred/Number of times the experiment was done</span>

From this formula, we can see that an experiment has to be performed for a number of times and an observation for a particular event needs to be noted.

For example, say you were asked to find the probability of getting a head from 20 tosses of a fair coin, you would have to literally get a coin, flip the coin 20 times and record the number of heads gotten after the 20 tosses. If the number of heads gotten was 17, the probability of getting a head will be (recall the formula):

P(Head)=Number of times we got a head/Number of times we tossed the fair coin=17/20</span>

Experimental probability is great because it is based on experimental facts but on the flip side, it could produce incorrect conclusions especially when the number of experiments done is few. For example, if the experimental probability of getting a head from four (4) tosses of coin is

, this probability will be questionable because that won’t be a good representation of the probability of getting a head in a coin toss basically because it will mean that one is guaranteed of getting a head in a coin toss. That conclusion will be misleading. Of course, the major cause for this bad result will be the few number of experiments.

Theoretical Probability

This thrives on expectations; meaning that its probability is usually expected and these expectations are gotten from information gathered on the situation. For example, in finding the probability of choosing a blue ball from a sack containing 12 balls ( 4 blues and 8 reds), the probability will be calculated based on the information present (4 blue balls and 8 red balls). Hence the probability will be 4/12=1/3

The formula for Theoretical Probability is:

P(E)=Number of successful outcome/Total number of possible outcomes</span>

For example, in finding the probability of selecting an odd number from a fair die, we need the number of odd numbers in a fair die (Number of successful outcome)-1, 3 and 5 and the number of possible values a fair die can have (Total number of possible outcomes)- 1,2,3,4,5, and 6.

P(Odd)=P(1,3,5)=3/6=1/2</span>

Theoretical probability just like Experimental probabilities has its disadvantage too. Theoretical probability doesn’t apply to every kind of real life situation. For example, the probability of a person acquiring an STD cannot just be 1/2 all the time. The probability will depend on a lot of things such as if the person is sexually active, has multiple sexual partners, uses protection, is prone to toilet infections and exchanges body fluid in some way and so on. For instance, if the person has multiple sexual partners and doesn’t use protection during sexual intercourse, the probability of the having an STD should be around 80% or more but theoretical probability would have automatically placed that as 1/2, where the patient may have it or not.

Hence theoretical probability requires a lot of logical reasoning and observation of facts and situations surrounding events for it to be properly used.

Experimental probability when performed rightly (using many experiments) has its value getting closer to theoretical probability. This means that with more experiments performed, the closer the experimental probability will be to the theoretical probability.

Probability is definitely useful:

it helps Doctors limit diagnosis of illnesses on patients
It helps network providers create new services for their customers
It helps students in selecting the right answer in multiple choice questions.
It helps employers narrow down their search for potential employees starting up a new role and being able to cope in that new role.