Dev Duniya
Mar 19, 2025
In machine learning, understanding central tendencies is fundamental. These statistical measures provide crucial insights into the distribution of data and help in various aspects of model building and analysis. Let's explore three key measures: Mean, Median, and Mode.
The mean, often referred to as the average, is calculated by summing all values in a dataset and then dividing the sum by the total number of values.
Mean = (Sum of all values) / (Number of values)
import numpy as np
num_list = [54, 3, 12, 65, 34, 23, 16, 9, 3, 8, 12, 3]
mean = np.mean(num_list)
print("Mean =", mean)
Mean = 20.166666666666668
The median is the middle value in a dataset when the data is arranged in ascending or descending order. If the dataset has an even number of values, the median is the average of the two middle values.
import numpy as np
num_list = [54, 3, 12, 65, 34, 23, 16, 9, 3, 8, 12, 3]
median = np.median(num_list)
print("Median =", median)
Median = 12.0
The mode is the value that appears most frequently in a dataset. A dataset can have multiple modes (multimodal) or no mode if all values occur with equal frequency.
from scipy import stats
num_list = [54, 3, 12, 65, 34, 23, 16, 9, 3, 8, 12, 3]
mode = stats.mode(num_list)
print("Mode =", mode)
Mode = ModeResult(mode=array([3]), count=array([3]))
The choice of which central tendency measure to use depends on the characteristics of the dataset and the specific goals of the analysis.
By understanding these key measures and their appropriate use cases, you can gain valuable insights into your data and make more informed decisions in your machine learning projects.