Dev Duniya
Mar 19, 2025
The Naive Bayes algorithm is a powerful and surprisingly effective supervised machine learning algorithm, particularly popular for text classification tasks. It's based on Bayes' Theorem, a fundamental concept in probability theory.
Core Idea: Naive Bayes classifiers are a family of probabilistic algorithms that utilize Bayes' Theorem to predict the class of a given data point.
"Naive" Assumption: The key assumption behind Naive Bayes is the "naive" assumption of independence between features. This means the algorithm assumes that the presence or absence of one feature is unrelated to the presence or absence of any other feature, given the class label. While this assumption is often violated in real-world scenarios, Naive Bayes still performs remarkably well in many practical applications.
Bayes' Theorem describes the probability of an event occurring given the probability of another event that has already occurred.
P(A|B) = (P(B|A) * P(A)) / P(B)
where:
Text Classification:
Image Classification:
Medical Diagnosis:
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# Sample dataset (simplified)
messages = ['This is a spam message',
'Urgent! Win a free prize!',
'Hello, how are you?',
'Order now and save!',
'Meeting tomorrow at 10 AM']
labels = ['spam', 'spam', 'ham', 'spam', 'ham']
# Create a CountVectorizer to convert text into numerical features
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(messages)
# Split data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, labels, test_size=0.2, random_state=42)
# Create a Multinomial Naive Bayes classifier
clf = MultinomialNB()
# Train the model
clf.fit(X_train, y_train)
# Make predictions
y_pred = clf.predict(X_test)
# Evaluate accuracy
accuracy = accuracy_score(y_test, y_pred)
print("Accuracy:", accuracy)
Import necessary libraries:
MultinomialNB
: Imports the Multinomial Naive Bayes classifier from scikit-learn.CountVectorizer
: Converts text data into numerical features (e.g., word frequencies).train_test_split
: Splits the data into training and testing sets.accuracy_score
: Calculates the accuracy of the model.Prepare the data:
CountVectorizer
to convert the text messages into a numerical representation (e.g., a matrix of word frequencies).Create and train the model:
MultinomialNB
classifier.fit()
method with the training data.Make predictions and evaluate:
Naive Bayes is a valuable algorithm in the machine learning toolkit, particularly for text classification tasks. Its simplicity, speed, and effectiveness make it a popular choice for various real-world applications.