Python程序计算标准差

Python 服务器端编程编程

在本文中，我们将学习如何实现一个Python程序来计算数据集的标准差。

考虑一组在任何坐标轴上绘制的值。这些值的标准差（称为总体）被定义为它们之间观察到的差异。如果标准差低，则值会紧密地绘制到平均值附近。但是，如果标准差高，则值会分散到远离平均值的更远处。

它表示为数据集方差的平方根。标准差有两种类型：

总体标准差是根据总体的每个数据值计算的。因此，它是一个固定值。数学公式定义为：

$\mathrm{SD\:=\:\sqrt{\frac{\sum(X_i\:-\:X_m)^2}{n}}}$

其中，

X_m 是数据集的平均值。
X_i 是数据集的元素。
n 是数据集中元素的数量。

但是，样本标准差是仅根据总体的某些数据值计算的统计量，因此该值取决于所选择的样本。数学公式定义为：

$\mathrm{SD\:=\:\sqrt{\frac{\sum(X_i\:-\:X_m)^2}{n\:-\:1}}}$

其中，

X_m 是数据集的平均值。
X_i 是数据集的元素。
n 是数据集中元素的数量。

输入输出场景

现在让我们看看各种数据集的一些输入输出场景：

假设数据集仅包含正整数：

Input: [2, 3, 4, 1, 2, 5]
Result: Population Standard Deviation: 1.3437096247164249
Sample Standard Deviation: 0.8975274678557505

假设数据集仅包含负整数：

Input: [-2, -3, -4, -1, -2, -5]
Result: Population Standard Deviation: 1.3437096247164249
Sample Standard Deviation: 0.8975274678557505

假设数据集包含正整数和负整数：

Input: [-2, -3, -4, 1, 2, 5]
Result: Population Standard Deviation: 3.131382371342656
Sample Standard Deviation: 2.967415635794143

使用数学公式

我们在本文中已经看到了标准差的公式；现在让我们看看用于在各种数据集上实现数学公式的Python程序。

示例

在下面的示例中，我们正在导入math 库，并通过在其方差上应用sqrt()内置方法来计算数据集的标准差。

Open Compiler

import math

#declare the dataset list
dataset = [2, 3, 4, 1, 2, 5]

#find the mean of dataset
sm=0
for i in range(len(dataset)):
   sm+=dataset[i]
   mean = sm/len(dataset)

#calculating population standard deviation of the dataset
deviation_sum = 0
for i in range(len(dataset)):
   deviation_sum+=(dataset[i]- mean)**2
   psd = math.sqrt((deviation_sum)/len(dataset))

#calculating sample standard deviation of the dataset
ssd = math.sqrt((deviation_sum)/len(dataset) - 1)

#display output
print("Population standard deviation of the dataset is", psd)
print("Sample standard deviation of the dataset is", ssd)

输出

获得的输出标准差如下：

Population standard deviation of the dataset is 1.3437096247164249
Sample standard deviation of the dataset is 0.8975274678557505

在numpy模块中使用std()函数

在这种方法中，我们导入numpy 模块，并且仅使用numpy.std()函数在numpy 数组的元素上计算总体标准差。

示例

实现以下Python程序来计算numpy数组元素的标准差：

Open Compiler

import numpy as np

#declare the dataset list
dataset = np.array([2, 3, 4, 1, 2, 5])

#calculating standard deviation of the dataset
sd = np.std(dataset)

#display output
print("Population standard deviation of the dataset is", sd)

输出

标准差显示为以下输出：

Population standard deviation of the dataset is 1.3437096247164249

Learn Python in-depth with real-world projects through our Python certification course. Enroll and become a certified expert to boost your career.

在statistics模块中使用stdev()和pstdev()函数

Python中的statistics 模块提供了名为stdev()和pstdev()的函数来计算样本数据集的标准差。Python中的 stdev()函数仅计算样本标准差，而pstdev()函数计算总体标准差。

这两个函数的参数和返回类型相同。

示例1：使用stdev()函数

用于演示stdev()函数的使用以查找数据集的样本标准差的Python程序如下：

Open Compiler

import statistics as st

#declare the dataset list
dataset = [2, 3, 4, 1, 2, 5]

#calculating standard deviation of the dataset
sd = st.stdev(dataset)

#display output
print("Standard Deviation of the dataset is", sd)

输出

获得的作为输出的数据集的样本标准差如下：

Standard Deviation of the dataset is 1.4719601443879744

示例2：使用pstdev()函数

用于演示pstdev()函数的使用以查找数据集的总体标准差的Python程序如下：

Open Compiler

import statistics as st

#declare the dataset list
dataset = [2, 3, 4, 1, 2, 5]

#calculating standard deviation of the dataset
sd = st.pstdev(dataset)

#display output
print("Standard Deviation of the dataset is", sd)

输出

获得的作为输出的数据集的样本标准差如下：

Standard Deviation of the dataset is 1.3437096247164249

Alekhya Nagulavancha

更新于： 2022年10月26日

13K+ 浏览量

启动您的职业生涯

通过完成课程获得认证

开始学习