如何使用 Python Scikit-learn 从数据集获取类似字典的对象?


借助 Scikit-learn Python 库,我们可以获取数据集的类似字典的对象。一些有趣的类似字典的对象属性如下:

  • data - 它表示要学习的数据。

  • target - 它表示回归目标。

  • DESCR - 数据集的描述。

  • target_names - 它给出数据集的目标名称。

  • feature_names - 它给出数据集的特征名称。

示例 1

在下面的示例中,我们使用加州住房数据集来获取其类似字典的对象。

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing() # Print dictionary-like objects print(housing.keys())

输出

它将产生以下输出:

dict_keys(['data', 'target', 'frame', 'target_names', 'feature_names', 'DESCR'])

示例 2

我们还可以获取有关这些类似字典的对象的更多详细信息,如下所示:

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing print(housing.data.shape) print('\n') print(housing.target.shape) print('\n') print(housing.feature_names) print('\n') print(housing.target_names) print('\n') print(housing.DESCR)

输出

它将产生以下输出:

(20640, 8)
(20640,)
['MedInc', 'HouseAge', 'AveRooms', 'AveBedrms', 'Population', 'AveOccup', 'Latitude', 'Longitude']
['MedHouseVal']
.. _california_housing_dataset:
California Housing dataset
--------------------------
**Data Set Characteristics:**
   :Number of Instances: 20640
   :Number of Attributes: 8 numeric, predictive attributes and the target
   :Attribute Information:
      - MedInc median income in block group
      - HouseAge median house age in block group
      - AveRooms average number of rooms per household
      - AveBedrms average number of bedrooms per household
      - Population block group population
      - AveOccup average number of household members
      - Latitude block group latitude
      - Longitude block group longitude
   :Missing Attribute Values: None
Omitted due to length of the output…

示例 3

# Import necessary libraries import sklearn import pandas as pd from sklearn.datasets import fetch_california_housing # Loading the California housing dataset housing = fetch_california_housing(as_frame=True) print(housing.frame.info())

输出

它将产生以下输出:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
#    Column       Non-Null Count    Dtype
---  ------       --------------    -----
 0   MedInc       20640 non-null   float64
 1   HouseAge     20640 non-null   float64
 2   AveRooms     20640 non-null   float64
 3   AveBedrms    20640 non-null   float64
 4   Population   20640 non-null   float64
 5   AveOccup     20640 non-null   float64
 6   Latitude     20640 non-null   float64
 7   Longitude    20640 non-null   float64
 8   MedHouseVal  20640 non-null   float64
dtypes: float64(9)
memory usage: 1.4 MB

更新于: 2022年10月4日

262 次查看

开启你的 职业生涯

通过完成课程获得认证

开始学习
广告

© . All rights reserved.