使用 SciPy 库实现糖尿病数据集的 K 均值聚类
我们将在此使用的皮马印第安人糖尿病数据集最初来自国家糖尿病、消化和肾脏疾病研究所。根据以下诊断因素,该数据集可用于将患者置于糖尿病集群或非糖尿病集群中 −
怀孕
葡萄糖
血压
皮肤厚度
胰岛素
BMI
糖尿病谱系函数
年龄
您可以在 Kaggle 网站上获取 .CSV 格式的此数据集。
示例
以下示例将使用 SciPy 库从皮马印第安人糖尿病数据集中创建两个集群,即糖尿病和非糖尿病。
#importing the required Python libraries:
import matplotlib.pyplot as plt
import numpy as np
from scipy.cluster.vq import whiten, kmeans, vq
#Loading the dataset:
dataset = np.loadtxt(r"{your path}\pima-indians-diabetes.csv", delimiter=",")
# Printing the data after excluding the outcome column
dataset = dataset[:, 0:8]
print("Data :
", dataset, "
")
#Normalizing the data:
dataset = whiten(dataset)
# generating code book by computing K-Means with K = 2 (2 clusters i.e., diabetic, and non-diabetic clusters)
centroids, mean_dist = kmeans(dataset, 2)
print("Code book :
", centroids, "
")
clusters, dist = vq(dataset, centroids)
print("Clusters :
", clusters, "
")
# forming cluster of non-diabetic patients
non_diabetic = list(clusters).count(0)
# forming cluster of diabetic patients
diabetic = list(clusters).count(1)
#Plotting the pie chart having clusters
x_axis = []
x_axis.append(diabetic)
x_axis.append(non_diabetic)
colors = ['red', 'green']
print("Total number of diabetic patients : " + str(x_axis[0]) + "
Total number non-diabetic patients : " + str(x_axis[1]))
y = ['diabetic', 'non-diabetic']
plt.pie(x_axis, labels=y, colors=colors, shadow='false')
plt.show()输出
Data : [[ 6. 148. 72. ... 33.6 0.627 50. ] [ 1. 85. 66. ... 26.6 0.351 31. ] [ 8. 183. 64. ... 23.3 0.672 32. ] ... [ 5. 121. 72. ... 26.2 0.245 30. ] [ 1. 126. 60. ... 30.1 0.349 47. ] [ 1. 93. 70. ... 30.4 0.315 23. ]] Code book : [[2.08198148 4.17698255 3.96280983 1.04984582 0.56968574 4.13266474 1.40143319 3.86427413] [0.6114727 3.56175537 3.35245694 1.42268776 0.76239717 4.01974705 1.43848683 2.24399453]] Clusters : [0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1] Total number of diabetic patients : 492 Total number non-diabetic patients : 276

广告
数据结构
网络
RDBMS
操作系统
Java
iOS
HTML
CSS
Android
Python
C 编程
C++
C#
MongoDB
MySQL
Javascript
PHP