NumPy 数据类型

NumPy 支持比 Python 更多样的数值类型。下表显示了 NumPy 中定义的不同标量数据类型。

序号	数据类型及描述
1	bool_ 布尔值 (True 或 False)，存储为一个字节
2	int_ 默认整数类型 (与 C 语言的 long 相同；通常为 int64 或 int32)
3	intc 与 C 语言的 int 相同 (通常为 int32 或 int64)
4	intp 用于索引的整数 (与 C 语言的 ssize_t 相同；通常为 int32 或 int64)
5	int8 字节 (-128 到 127)
6	int16 整数 (-32768 到 32767)
7	int32 整数 (-2147483648 到 2147483647)
8	int64 整数 (-9223372036854775808 到 9223372036854775807)
9	uint8 无符号整数 (0 到 255)
10	uint16 无符号整数 (0 到 65535)
11	uint32 无符号整数 (0 到 4294967295)
12	uint64 无符号整数 (0 到 18446744073709551615)
13	float_ float64 的简写
14	float16 半精度浮点数：符号位，5 位指数，10 位尾数
15	float32 单精度浮点数：符号位，8 位指数，23 位尾数
16	float64 双精度浮点数：符号位，11 位指数，52 位尾数
17	complex_ complex128 的简写
18	complex64 复数，由两个 32 位浮点数表示（实部和虚部）
19	complex128 复数，由两个 64 位浮点数表示（实部和虚部）

NumPy 数值类型是 dtype（数据类型）对象的实例，每个对象都有其独特的特性。这些 dtype 可作为 np.bool_、np.float32 等使用。

数据类型对象 (dtype)

数据类型对象描述了对应于数组的固定内存块的解释，取决于以下几个方面：

数据类型 (整数、浮点数或 Python 对象)
数据大小
字节序 (小端序或大端序)
对于结构化类型，字段名称、每个字段的数据类型以及每个字段占用的内存块部分。
如果数据类型是子数组，则其形状和数据类型。

字节序由在数据类型前添加“<”或“>”来决定。“<”表示编码是小端序（最低有效位存储在最小地址）。“>”表示编码是大端序（最高有效位存储在最小地址）。

dtype 对象使用以下语法构造：

numpy.dtype(object, align, copy)

参数如下：

对象 - 要转换为数据类型对象的。
Align - 如果为 True，则向字段添加填充，使其类似于 C 结构体。
Copy - 创建 dtype 对象的新副本。如果为 False，则结果是对内置数据类型对象的引用。

示例：使用数组标量类型

import numpy as np
dt = np.dtype(np.int32)
print(dt)

以下是获得的输出：

int32

示例：使用数据类型的等效字符串

import numpy as np
dt = np.dtype('i4')
print(dt)

这将产生以下结果：

int32

示例：使用字节序表示法

import numpy as np
dt = np.dtype('>i4')
print(dt)

以下是上述代码的输出：

>i4

示例：创建结构化数据类型

以下示例显示了结构化数据类型的用法。这里，需要声明字段名和对应的标量数据类型：

import numpy as np
dt = np.dtype([('age', np.int8)])
print(dt)

获得的输出如下所示：

[('age', 'i1')]

示例：将结构化数据类型应用于 ndarray

import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a)

执行上述代码后，我们得到以下输出：

[(10,) (20,) (30,)]

示例：访问结构化数据类型的字段内容

import numpy as np
dt = np.dtype([('age', np.int8)])
a = np.array([(10,), (20,), (30,)], dtype=dt)
print(a['age'])

产生的结果如下：

[10 20 30]

示例：定义复杂的结构化数据类型

以下示例定义了一个名为 student 的结构化数据类型，其中包含一个字符串字段 'name'、一个 整数字段 'age' 和一个 浮点字段 'marks'。此 dtype 应用于 ndarray 对象：

import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
print(student)

我们得到如下所示的输出：

[('name', 'S20'), ('age', 'i1'), ('marks', '<f4')])

示例：将复杂的结构化数据类型应用于 ndarray

import numpy as np
student = np.dtype([('name', 'S20'), ('age', 'i1'), ('marks', 'f4')])
a = np.array([('abc', 21, 50), ('xyz', 18, 75)], dtype=student)
print(a)

输出如下：

[('abc', 21, 50.0), ('xyz', 18, 75.0)]

每个内置数据类型都有一个字符代码来唯一标识它。

'b' - 布尔值
'i' - (带符号) 整数
'u' - 无符号整数
'f' - 浮点数
'c' - 复浮点数
'm' - timedelta
'M' - datetime
'O' - (Python) 对象
'S', 'a' - (字节) 字符串
'U' - Unicode
'V' - 原始数据 (void)

检查数组的数据类型

可以使用 dtype 属性检查数组的数据类型。此属性返回一个 dtype 对象，该对象描述了数组中元素的类型，如下所示：

import numpy as np
a = np.array([1, 2, 3])
print(a.dtype)

以下是获得的输出：

int64

创建具有已定义数据类型的数组

在 NumPy 中，可以在创建数组时显式指定元素的数据类型 (dtype)。

我们可以在数组创建函数（例如 np.array()、np.zeros()、np.ones() 等）中使用 dtype 参数来定义数组元素的数据类型。默认情况下，NumPy 会从输入数据中推断数据类型。

示例：创建整数数组

在这个示例中，我们创建了一个名为 a 的数组，其元素类型为 int32，这意味着每个元素都是一个 32 位整数：

import numpy as np

# Creating an array of integers with a specified dtype
a = np.array([1, 2, 3], dtype=np.int32)
print("Array:", a)
print("Data type:", a.dtype)

这将产生以下结果：

Array: [1 2 3]
Data type: int32

示例：创建整数数组

这里，我们创建了一个名为 c 的数组，其元素类型为 complex64，表示 64 位复数（32 位实部和 32 位虚部）：

import numpy as np

# Creating an array of complex numbers with a specified dtype
c = np.array([1+2j, 3+4j, 5+6j], dtype=np.complex64)
print("Array:", c)
print("Data type:", c.dtype)

以下是上述代码的输出：

Array: [1.+2.j 3.+4.j 5.+6.j]Data type: complex64

转换 NumPy 数组的数据类型

NumPy 提供了几种方法来转换数组的数据类型，允许您更改数据存储和处理方式，而无需修改底层值：

astype() 方法 - 这是最常用的类型转换方法。
numpy.cast() 函数 - NumPy 提供的一组用于将数组转换为不同类型的函数。
就地类型转换 - 在创建数组时直接转换类型。

示例：使用 "astype" 方法

astype 方法创建数组的副本，并将其转换为指定的类型。这是更改数组数据类型的最常用方法。

这里，我们使用 NumPy 中的 astype() 方法将整数数组转换为浮点数据类型：

import numpy as np

# Creating an array of integers
a = np.array([1, 2, 3, 4, 5])
print("Original array:", a)
print("Original dtype:", a.dtype)

# Converting to float
a_float = a.astype(np.float32)
print("Converted array:", a_float)
print("Converted dtype:", a_float.dtype)

获得的输出如下所示：

Original array: [1 2 3 4 5]
Original dtype: int64
Converted array: [1. 2. 3. 4. 5.]
Converted dtype: float32

示例：使用 "numpy.cast" 函数

NumPy 还提供用于将数组转换为特定类型的函数。这些函数不太常用，但在某些情况下可能很有用。

在这个示例中，我们创建一个浮点数数组，并使用 numpy.int32() 函数将其转换为整数：

import numpy as np

# Creating an array of floats
d = np.array([1.1, 2.2, 3.3, 4.4, 5.5])
print("Original array:", d)
print("Original dtype:", d.dtype)

# Converting to integer using numpy.int32
d_int = np.int32(d)
print("Converted array:", d_int)
print("Converted dtype:", d_int.dtype)

执行上述代码后，我们得到以下输出：

Original array: [1.1 2.2 3.3 4.4 5.5]
Original dtype: float64
Converted array: [1 2 3 4 5]
Converted dtype: int32

示例：就地类型转换

您还可以在创建数组时指定数据类型，以避免以后需要转换类型。

现在，我们使用 numpy.float32() 函数指定浮点数据类型来创建一个整数数组：

import numpy as np

# Creating an array of integers with a specified dtype
e = np.array([1, 2, 3, 4, 5], dtype=np.float32)
print("Array:", e)
print("Data type:", e.dtype)

产生的结果如下：

Array: [1. 2. 3. 4. 5.]
Data type: float32

如果值无法转换怎么办？

在 NumPy 中转换数据类型时，可能会遇到无法转换为所需类型的值。这种情况通常会引发错误或导致意外行为。

让我们探讨无法转换值的不同场景以及如何处理它们：

场景 1：将非数字字符串转换为数字

如果尝试将非数字字符串转换为整数或浮点数，NumPy 将引发 ValueError，如下所示：

import numpy as np

# Creating an array with non-numeric strings
a = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", a)
print("Original dtype:", a.dtype)

try:
   # Attempting to convert to integer
   a_int = a.astype(np.int32)
   print("Converted array:", a_int)
   print("Converted dtype:", a_int.dtype)
except ValueError as e:
   print("Error:", e)

在这种情况下，字符串“three”无法转换为整数，导致出现 ValueError，如下所示：

Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Error: invalid literal for int() with base 10: 'three'

场景 2：转换超出范围的数字

如果尝试转换对于目标数据类型而言超出范围的数字，NumPy 将引发 OverflowError：

import numpy as np

# Creating an array with large float values
b = np.array([1.1e10, 2.2e10, 3.3e10])
print("Original array:", b)
print("Original dtype:", b.dtype)

try:
   # Attempting to convert to integer
   b_int = b.astype(np.int32)
   print("Converted array:", b_int)
   print("Converted dtype:", b_int.dtype)
except OverflowError as e:
   print("Error:", e)

这里，较大的浮点值无法转换为 int32 而不会溢出：

Original array: [1.1e+10 2.2e+10 3.3e+10]
Original dtype: float64
Error: OverflowError: (34, 'Numerical result out of range')

场景 3：将复数转换为实数

将复数转换为实数时，NumPy 会丢弃虚部并引发 ComplexWarning：

import numpy as np

# Creating an array with complex numbers
c = np.array([1+2j, 3+4j, 5+6j])
print("Original array:", c)
print("Original dtype:", c.dtype)

# Converting to float, discarding imaginary part
c_float = c.astype(np.float32)
print("Converted array:", c_float)
print("Converted dtype:", c_float.dtype)

在这种情况下，NumPy 会引发 ComplexWarning 并丢弃转换过程中的虚部：

Original array: [1.+2.j 3.+4.j 5.+6.j]
Original dtype: complex128
ComplexWarning: Casting complex values to real discards the imaginary partc_float = c.astype(np.float32)
Converted array: [1. 3. 5.]
Converted dtype: float32

场景 4：处理转换错误

要处理转换错误，可以使用 try-except 块之类的错误处理技术来捕获和处理异常。

import numpy as np

# Creating an array with mixed data
d = np.array(['1', '2', 'three', '4', '5'])
print("Original array:", d)
print("Original dtype:", d.dtype)

def safe_convert(arr, target_type):
   try:
      return arr.astype(target_type)
   except ValueError as e:
      print("Conversion error:", e)
      return None

# Attempting to convert to integer
d_int = safe_convert(d, np.int32)
if d_int is not None:
   print("Converted array:", d_int)
   print("Converted dtype:", d_int.dtype)
else:
   print("Conversion failed.")

在这个示例中，safe_convert() 函数捕获“ValueError”并通过返回 None 和打印错误消息来处理它，如下所示：

Original array: ['1' '2' 'three' '4' '5']
Original dtype: <U5
Conversion error: invalid literal for int() with base 10: 'three'
Conversion failed.

场景 5：对无效转换使用 "np.nan"

对于数值转换，可以使用 np.nan（非数字）来处理无效值。这种方法在处理缺失数据或损坏数据时很有用。

import numpy as np

# Creating an array with strings, including an invalid entry
e = np.array(['1.1', '2.2', 'three', '4.4', '5.5'])
print("Original array:", e)
print("Original dtype:", e.dtype)

def convert_with_nan(arr):
   result = []
   for item in arr:
      try:
         result.append(float(item))
      except ValueError:
         result.append(np.nan)
   return np.array(result)

# Converting to float with np.nan for invalid entries
e_float = convert_with_nan(e)
print("Converted array:", e_float)
print("Converted dtype:", e_float.dtype)

此处，无效条目将被替换为 np.nan −

Original array: ['1.1' '2.2' 'three' '4.4' '5.5']
Original dtype: <U5
Converted array: [1.1 2.2 nan 4.4 5.5]
Converted dtype: float64

现有数组的数据类型转换

您还可以使用view()方法转换现有数组的数据类型，以更改数据的解释方式，而无需更改底层字节。

示例

此处，数据被重新解释为“float32”，由于底层字节保持不变，因此导致了意外的值−

import numpy as np

# Creating an array of integers
g = np.array([1, 2, 3, 4], dtype=np.int32)
print("Original array:", g)
print("Original dtype:", g.dtype)

# Viewing the array as float32
g_view = g.view(np.float32)
print("Viewed array:", g_view)
print("Viewed dtype:", g_view.dtype)

以下是上述代码的输出：

Original array: [1 2 3 4]
Original dtype: int32
Viewed array: [1.4012985e-45 2.8025969e-45 4.2038954e-45 5.6051939e-45]
Viewed dtype: float32

打印页面