使用 SciPy 库实现糖尿病数据集的 K 均值聚类
我们将在此使用的皮马印第安人糖尿病数据集最初来自国家糖尿病、消化和肾脏疾病研究所。根据以下诊断因素,该数据集可用于将患者置于糖尿病集群或非糖尿病集群中 −
怀孕
葡萄糖
血压
皮肤厚度
胰岛素
BMI
糖尿病谱系函数
年龄
您可以在 Kaggle 网站上获取 .CSV 格式的此数据集。
示例
以下示例将使用 SciPy 库从皮马印第安人糖尿病数据集中创建两个集群,即糖尿病和非糖尿病。
#importing the required Python libraries: import matplotlib.pyplot as plt import numpy as np from scipy.cluster.vq import whiten, kmeans, vq #Loading the dataset: dataset = np.loadtxt(r"{your path}\pima-indians-diabetes.csv", delimiter=",") # Printing the data after excluding the outcome column dataset = dataset[:, 0:8] print("Data :
", dataset, "
") #Normalizing the data: dataset = whiten(dataset) # generating code book by computing K-Means with K = 2 (2 clusters i.e., diabetic, and non-diabetic clusters) centroids, mean_dist = kmeans(dataset, 2) print("Code book :
", centroids, "
") clusters, dist = vq(dataset, centroids) print("Clusters :
", clusters, "
") # forming cluster of non-diabetic patients non_diabetic = list(clusters).count(0) # forming cluster of diabetic patients diabetic = list(clusters).count(1) #Plotting the pie chart having clusters x_axis = [] x_axis.append(diabetic) x_axis.append(non_diabetic) colors = ['red', 'green'] print("Total number of diabetic patients : " + str(x_axis[0]) + "
Total number non-diabetic patients : " + str(x_axis[1])) y = ['diabetic', 'non-diabetic'] plt.pie(x_axis, labels=y, colors=colors, shadow='false') plt.show()
输出
Data : [[ 6. 148. 72. ... 33.6 0.627 50. ] [ 1. 85. 66. ... 26.6 0.351 31. ] [ 8. 183. 64. ... 23.3 0.672 32. ] ... [ 5. 121. 72. ... 26.2 0.245 30. ] [ 1. 126. 60. ... 30.1 0.349 47. ] [ 1. 93. 70. ... 30.4 0.315 23. ]] Code book : [[2.08198148 4.17698255 3.96280983 1.04984582 0.56968574 4.13266474 1.40143319 3.86427413] [0.6114727 3.56175537 3.35245694 1.42268776 0.76239717 4.01974705 1.43848683 2.24399453]] Clusters : [0 1 0 1 1 1 1 1 0 0 0 0 0 0 0 1 1 0 1 1 1 0 0 0 0 0 0 1 0 0 0 1 1 0 0 1 0 0 1 0 1 0 0 0 0 1 1 1 1 1 1 1 1 0 0 1 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 0 0 0 1 1 1 1 1 1 0 1 1 1 1 1 0 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 1 0 1 0 1 1 1 0 1 0 0 1 1 1 0 0 0 1 1 1 0 1 1 1 1 0 1 1 1 1 0 0 1 0 0 0 1 1 1 0 0 0 1 0 1 1 0 0 0 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 1 1 0 1 0 0 1 1 1 0 1 0 1 0 1 1 1 1 1 1 1 0 1 1 1 0 0 1 0 1 1 1 1 1 1 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 0 1 1 0 1 0 1 1 1 1 0 1 0 1 0 1 1 1 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1 1 0 1 1 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 1 0 1 0 0 1 1 0 1 1 1 0 0 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 1 1 1 0 1 1 0 1 1 1 1 1 1 1 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 0 0 0 0 1 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 0 0 1 0 0 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 0 1 0 0 0 1 0 1 0 1 1 1 0 1 1 1 1 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 0 1 1 1 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 1 0 0 1 0 1 1 1 0 0 0 0 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 0 0 1 1 0 1 0 1 1 0 0 1 0 0 1 1 0 1 1 0 1 0 0 0 0 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1 1 0 0 1 1 0 0 0 1 0 1 0 1 0 1 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 1 0 1 0 1 0 1 0 1 1 1 0 1 1 1 1 1 0 1 0 1 1 1 0 0 0 1 1 1 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0 1 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 1 1 1 1 1 1 1 0 1 1 1 1 0 0 0 1 0 1 0 1 1 1 1 1 0 0 1 1 1 1 1 0 1 1 0 0 1 1 0 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 1 1 0 1 0 0 1 0 1 0 0 0 1 1 0 1] Total number of diabetic patients : 492 Total number non-diabetic patients : 276
广告