使用 SciPy 库实现糖尿病数据集的 K 均值聚类


我们将在此使用的皮马印第安人糖尿病数据集最初来自国家糖尿病、消化和肾脏疾病研究所。根据以下诊断因素,该数据集可用于将患者置于糖尿病集群或非糖尿病集群中 −

  • 怀孕

  • 葡萄糖

  • 血压

  • 皮肤厚度

  • 胰岛素

  • BMI

  • 糖尿病谱系函数

  • 年龄

您可以在 Kaggle 网站上获取 .CSV 格式的此数据集。

示例

以下示例将使用 SciPy 库从皮马印第安人糖尿病数据集中创建两个集群,即糖尿病和非糖尿病。

#importing the required Python libraries:
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.vq import whiten, kmeans, vq

#Loading the dataset:
dataset = np.loadtxt(r"{your path}\pima-indians-diabetes.csv", delimiter=",")

# Printing the data after excluding the outcome column
dataset = dataset[:, 0:8]
print("Data :
", dataset, "
") #Normalizing the data: dataset = whiten(dataset) # generating code book by computing K-Means with K = 2 (2 clusters i.e., diabetic, and non-diabetic clusters) centroids, mean_dist = kmeans(dataset, 2) print("Code book :
", centroids, "
") clusters, dist = vq(dataset, centroids) print("Clusters :
", clusters, "
") # forming cluster of non-diabetic patients non_diabetic = list(clusters).count(0) # forming cluster of diabetic patients diabetic = list(clusters).count(1) #Plotting the pie chart having clusters x_axis = [] x_axis.append(diabetic) x_axis.append(non_diabetic) colors = ['red', 'green'] print("Total number of diabetic patients : " + str(x_axis[0]) + "
Total number non-diabetic patients : " + str(x_axis[1])) y = ['diabetic', 'non-diabetic'] plt.pie(x_axis, labels=y, colors=colors, shadow='false') plt.show()

输出

Data :
[[ 6. 148. 72. ... 33.6 0.627 50. ]
[ 1. 85. 66. ... 26.6 0.351 31. ]
[ 8. 183. 64. ... 23.3 0.672 32. ]
...
[ 5. 121. 72. ... 26.2 0.245 30. ]
[ 1. 126. 60. ... 30.1 0.349 47. ]
[ 1. 93. 70. ... 30.4 0.315 23. ]]

Code book :
[[2.08198148 4.17698255 3.96280983 1.04984582 0.56968574 4.13266474
1.40143319 3.86427413]
[0.6114727 3.56175537 3.35245694 1.42268776 0.76239717 4.01974705
1.43848683 2.24399453]]

Clusters :
[0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1
0
0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1
1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1
0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0
0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0
1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1
0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1
1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1
0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0
1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0
1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0
1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1
1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0
0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1
0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1
0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0
1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1
0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0
1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1
0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1]

Total number of diabetic patients : 492
Total number non-diabetic patients : 276

更新时间:2021-12-14

719 浏览量

开启你的 职业生涯

完成课程获得认证

开始学习
广告