For this programming assignment you will implement the Naive Bayes algorithm from scratch and the functions to evaluate it with a k-fold cross validation (also from scratch). You can use the code in the following tutorial to get started and get ideas for your implementation of the Naive Bayes algorithm but please, enhance it as much as you can (there are many things you can do to enhance it such as those mentioned at the end of the tutorial):

Respuesta :

Bayes’ Theorem provides a way that we can calculate the probability of a piece of data belonging to a given class, given our prior knowledge.

P(class|data) = (P(data|class) * P(class)) / P(data)

Where P(class|data) is the probability of class given the provided data.

Explanation:

  • Naive Bayes is a classification algorithm for binary  and multiclass classification problems.
  • It is called Naive Bayes or idiot Bayes because the calculations of the probabilities for each class are simplified to make their calculations tractable.

This Naive Bayes tutorial is broken down into 5 parts:

Step 1: Separate By Class :  Calculate the probability of data by the class they belong to, the so-called base rate. Separate our training data by class.

Step 2: Summarize Dataset : The two statistics we require from a given dataset are the mean and the standard deviation

The mean is the average value and can be calculated using :

mean = sum(x)/n * count(x)

Step 3: Summarize Data By Class : Statistics from our training dataset organized by class.

Step 4: Gaussian Probability Density Function : Probability or likelihood of observing a given real-value. One way we can do this is to assume that the values are drawn from a distribution, such as a bell curve or Gaussian distribution.

Step 5: Class Probabilities :  The statistics calculated from our training data to calculate probabilities for new data.  Probabilities are calculated separately for each class. This means that we first calculate the probability that a new piece of data belongs to the first class, then calculate the second class, on for all the classes.