如何使用 Python Scikit-learn 从数据集获取类似字典的对象?
借助 Scikit-learn Python 库,我们可以获取数据集的类似字典的对象。一些有趣的类似字典的对象属性如下:
data - 它表示要学习的数据。
target - 它表示回归目标。
DESCR - 数据集的描述。
target_names - 它给出数据集的目标名称。
feature_names - 它给出数据集的特征名称。
示例 1
在下面的示例中,我们使用加州住房数据集来获取其类似字典的对象。
# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing() # Print dictionary-like objects print(housing.keys())
输出
它将产生以下输出:
dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])
示例 2
我们还可以获取有关这些类似字典的对象的更多详细信息,如下所示:
# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing print(housing.data.shape) print('\n') print(housing.target.shape) print('\n') print(housing.feature_names) print('\n') print(housing.target_names) print('\n') print(housing.DESCR)
输出
它将产生以下输出:
(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
:Number of Instances: 20640
:Number of Attributes: 8 numeric, predictive attributes and the target
:Attribute Information:
- MedInc median income in block group
- HouseAge median house age in block group
- AveRooms average number of rooms per household
- AveBedrms average number of bedrooms per household
- Population block group population
- AveOccup average number of household members
- Latitude block group latitude
- Longitude block group longitude
:Missing Attribute Values: None
Omitted due to length of the output…
示例 3
# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing(as_frame=True) print(housing.frame.info())
输出
它将产生以下输出:
<class 'pandas.core.frame.DataFrame'> RangeIndex: 20640 entries, 0 to 20639 Data columns (total 9 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 MedInc 20640 non-null float64 1 HouseAge 20640 non-null float64 2 AveRooms 20640 non-null float64 3 AveBedrms 20640 non-null float64 4 Population 20640 non-null float64 5 AveOccup 20640 non-null float64 6 Latitude 20640 non-null float64 7 Longitude 20640 non-null float64 8 MedHouseVal 20640 non-null float64 dtypes: float64(9) memory usage: 1.4 MB
广告
数据结构
网络
关系数据库管理系统
操作系统
Java
iOS
HTML
CSS
Android
Python
C 语言编程
C++
C#
MongoDB
MySQL
Javascript
PHP