- MySQL 基础
- MySQL - 首页
- MySQL - 简介
- MySQL - 特性
- MySQL - 版本
- MySQL - 变量
- MySQL - 安装
- MySQL - 管理
- MySQL - PHP 语法
- MySQL - Node.js 语法
- MySQL - Java 语法
- MySQL - Python 语法
- MySQL - 连接
- MySQL - Workbench
- MySQL 数据库
- MySQL - 创建数据库
- MySQL - 删除数据库
- MySQL - 选择数据库
- MySQL - 显示数据库
- MySQL - 复制数据库
- MySQL - 数据库导出
- MySQL - 数据库导入
- MySQL - 数据库信息
- MySQL 用户
- MySQL - 创建用户
- MySQL - 删除用户
- MySQL - 显示用户
- MySQL - 修改密码
- MySQL - 授予权限
- MySQL - 显示权限
- MySQL - 收回权限
- MySQL - 锁定用户帐户
- MySQL - 解锁用户帐户
- MySQL 表
- MySQL - 创建表
- MySQL - 显示表
- MySQL - 修改表
- MySQL - 重命名表
- MySQL - 克隆表
- MySQL - 截断表
- MySQL - 临时表
- MySQL - 修复表
- MySQL - 描述表
- MySQL - 添加/删除列
- MySQL - 显示列
- MySQL - 重命名列
- MySQL - 表锁定
- MySQL - 删除表
- MySQL - 派生表
- MySQL 查询
- MySQL - 查询
- MySQL - 约束
- MySQL - INSERT 查询
- MySQL - SELECT 查询
- MySQL - UPDATE 查询
- MySQL - DELETE 查询
- MySQL - REPLACE 查询
- MySQL - INSERT IGNORE
- MySQL - INSERT on Duplicate Key Update
- MySQL - INSERT INTO SELECT
- MySQL 运算符和子句
- MySQL - WHERE 子句
- MySQL - LIMIT 子句
- MySQL - DISTINCT 子句
- MySQL - ORDER BY 子句
- MySQL - GROUP BY 子句
- MySQL - HAVING 子句
- MySQL - AND 运算符
- MySQL - OR 运算符
- MySQL - LIKE 运算符
- MySQL - IN 运算符
- MySQL - ANY 运算符
- MySQL - EXISTS 运算符
- MySQL - NOT 运算符
- MySQL - 不等于运算符 (NOT EQUAL)
- MySQL - IS NULL 运算符
- MySQL - IS NOT NULL 运算符
- MySQL - BETWEEN 运算符
- MySQL - UNION 运算符
- MySQL - UNION vs UNION ALL
- MySQL - MINUS 运算符
- MySQL - INTERSECT 运算符
- MySQL - INTERVAL 运算符
- MySQL 连接
- MySQL - 使用连接
- MySQL - INNER JOIN
- MySQL - LEFT JOIN
- MySQL - RIGHT JOIN
- MySQL - CROSS JOIN
- MySQL - FULL JOIN
- MySQL - 自连接 (SELF JOIN)
- MySQL - DELETE JOIN
- MySQL - UPDATE JOIN
- MySQL - UNION vs JOIN
- MySQL 密钥
- MySQL - 唯一键 (UNIQUE KEY)
- MySQL - 主键 (PRIMARY KEY)
- MySQL - 外键 (FOREIGN KEY)
- MySQL - 组合键 (COMPOSITE KEY)
- MySQL - 备用键 (ALTERNATE KEY)
- MySQL 触发器
- MySQL - 触发器
- MySQL - 创建触发器
- MySQL - 显示触发器
- MySQL - 删除触发器
- MySQL - BEFORE INSERT 触发器
- MySQL - AFTER INSERT 触发器
- MySQL - BEFORE UPDATE 触发器
- MySQL - AFTER UPDATE 触发器
- MySQL - BEFORE DELETE 触发器
- MySQL - AFTER DELETE 触发器
- MySQL 数据类型
- MySQL - 数据类型
- MySQL - VARCHAR
- MySQL - BOOLEAN
- MySQL - ENUM
- MySQL - DECIMAL
- MySQL - INT
- MySQL - FLOAT
- MySQL - BIT
- MySQL - TINYINT
- MySQL - BLOB
- MySQL - SET
- MySQL 正则表达式
- MySQL - 正则表达式
- MySQL - RLIKE 运算符
- MySQL - NOT LIKE 运算符
- MySQL - NOT REGEXP 运算符
- MySQL - regexp_instr() 函数
- MySQL - regexp_like() 函数
- MySQL - regexp_replace() 函数
- MySQL - regexp_substr() 函数
- MySQL 函数 & 运算符
- MySQL - 日期和时间函数
- MySQL - 算术运算符
- MySQL - 数值函数
- MySQL - 字符串函数
- MySQL - 聚合函数
- MySQL 其他概念
- MySQL - NULL 值
- MySQL - 事务
- MySQL - 使用序列
- MySQL - 处理重复数据
- MySQL - SQL 注入
- MySQL - 子查询
- MySQL - 注释
- MySQL - 检查约束
- MySQL - 存储引擎
- MySQL - 将表导出到 CSV 文件
- MySQL - 将 CSV 文件导入数据库
- MySQL - UUID
- MySQL - 公共表表达式 (CTE)
- MySQL - ON DELETE CASCADE
- MySQL - Upsert
- MySQL - 水平分区
- MySQL - 垂直分区
- MySQL - 游标
- MySQL - 存储函数
- MySQL - SIGNAL
- MySQL - RESIGNAL
- MySQL - 字符集
- MySQL - 排序规则
- MySQL - 通配符
- MySQL - 别名
- MySQL - ROLLUP
- MySQL - 当天日期
- MySQL - 字面量
- MySQL - 存储过程
- MySQL - EXPLAIN
- MySQL - JSON
- MySQL - 标准差
- MySQL - 查找重复记录
- MySQL - 删除重复记录
- MySQL - 选择随机记录
- MySQL - SHOW PROCESSLIST
- MySQL - 修改列类型
- MySQL - 重置自动递增
- MySQL - COALESCE() 函数
- MySQL 有用资源
- MySQL - 有用函数
- MySQL - 语句参考
- MySQL - 快速指南
- MySQL - 有用资源
- MySQL - 讨论
MySQL - 处理重复数据
数据库中的表或结果集通常包含重复记录。虽然通常允许重复,但在某些情况下需要防止重复。在这种情况下,必须从数据库表中识别并删除重复记录。
处理 MySQL 重复数据的重要性
处理数据库中重复数据的原因有很多。主要原因之一是,组织数据库中存在重复数据会导致逻辑错误。此外,我们需要处理冗余数据以防止以下后果:
- 重复数据占用存储空间,降低数据库使用效率并增加存储成本。
- 处理重复记录会消耗额外的资源,从而增加维护数据库的总成本。
- 数据库中的重复数据会导致数据逻辑错误,影响存储信息的完整性和可靠性。
防止重复条目
您可以使用表中具有相应字段的主键 (PRIMARY KEY) 或唯一 (UNIQUE) 索引来防止将重复记录条目插入表中。
示例
下表不包含此类索引或主键,因此它允许first_name 和last_name 的重复记录。
CREATE TABLE CUSTOMERS ( first_name CHAR(20), last_name CHAR(20), sex CHAR(10) );
为防止在此表中创建具有相同 first_name 和 last_name 值的多个记录,请向其定义中添加主键 (PRIMARY KEY)。执行此操作时,还需要将索引列声明为NOT NULL,因为主键 (PRIMARY KEY) 不允许NULL 值:
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name) );
使用 INSERT IGNORE 查询:
表中存在唯一索引通常会在尝试插入与索引列中现有记录重复的记录时导致错误。
为了处理这种情况而不产生错误,您可以使用“INSERT IGNORE”命令。当记录不是重复记录时,MySQL 会照常插入它。但是,如果记录是重复的,“IGNORE”关键字会指示 MySQL 丢弃它,而不会产生错误。
提供的示例不会导致错误,并且它还确保不会插入重复记录:
INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');
我们将获得以下输出:
Query OK, 1 row affected, 1 warning (0.01 sec) Records: 2 Duplicates: 1 Warnings: 1
使用 REPLACE 查询:
考虑使用 REPLACE 命令代替 INSERT 命令。处理新记录时,它会像 INSERT 一样插入。但是,如果它是重复的,新记录将替换旧记录。
REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');
以下是上述代码的输出:
Query OK, 2 rows affected (0.01 sec) Records: 2 Duplicates: 0 Warnings: 0
您在 INSERT IGNORE 和 REPLACE 命令之间的选择应取决于您希望实现的特定重复处理行为。INSERT IGNORE 命令保留第一组重复记录并丢弃其余记录。另一方面,REPLACE 命令保留最后一组重复项并删除任何较早的实例。
使用 UNIQUE 约束:
强制唯一性的另一种方法是向表添加唯一 (UNIQUE) 索引,而不是主键 (PRIMARY KEY):
CREATE TABLE CUSTOMERS ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), UNIQUE (last_name, first_name) );
计数和识别重复项
您可以使用 COUNT 函数和 GROUP BY 子句根据特定列计数和识别重复记录。
示例
以下是计算表中 first_name 和 last_name 的重复记录的查询:
SELECT COUNT(*) as repetitions, last_name, first_name FROM CUSTOMERS GROUP BY last_name, first_name HAVING repetitions > 1;
此查询将返回 CUSTOMERS 表中所有重复记录的列表。一般来说,要识别重复的值集,请遵循以下步骤。
确定哪些列可能包含重复值。
将这些列包含在列选择列表中,以及 COUNT(*)。
还在 GROUP BY 子句中列出这些列。
应用 HAVING 子句以通过要求组计数大于一来过滤唯一值。
从查询结果中消除重复项
您可以将 DISTINCT 命令与 SELECT 语句一起使用来查找表中可用的唯一记录。
SELECT DISTINCT last_name, first_name FROM CUSTOMERS ORDER BY last_name;
除了DISTINCT命令,还可以添加GROUP BY子句,指定要选择的列。这种方法可以消除重复项,只检索指定列中唯一的值组合。
SELECT last_name, first_name FROM CUSTOMERS GROUP BY (last_name, first_name);
使用表替换删除重复项
如果表中存在重复记录,并且想要删除表中的所有重复记录,请按照以下步骤操作:
CREATE TABLE tmp AS SELECT DISTINCT last_name, first_name, sex FROM CUSTOMERS; DROP TABLE CUSTOMERS; ALTER TABLE tmp RENAME TO CUSTOMERS;
使用客户端程序处理重复项
我们也可以使用客户端程序来处理重复数据。
语法
要通过PHP程序处理重复值,需要使用mysqli函数query()执行“INSERT IGNORE”语句,如下所示:
$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; $mysqli->query($sql);
要通过JavaScript程序处理重复值,需要使用mysql2库的query()函数执行“INSERT IGNORE”语句,如下所示:
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql);
要通过Java程序处理重复值,需要使用JDBC函数execute()执行“INSERT IGNORE”语句,如下所示:
String sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; statement.execute(sql);
要通过Python程序处理重复值,需要使用MySQL Connector/Python的execute()函数执行“INSERT IGNORE”语句,如下所示:
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" cursorObj.execute(sql)
示例
以下是程序示例:
$dbhost = 'localhost'; $dbuser = 'root'; $dbpass = 'password'; $db = 'TUTORIALS'; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db); if ($mysqli->connect_errno) { printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); $sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; if($mysqli->query($sql)){ printf("Table created successfully...!\n"); } //let's insert some records $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } //now lets insert duplicate record with IGNORE keyword $sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...!\n"); } $sql = "SELECT * from person_tbl"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf("\n"); } } //lets insert a duplicate record $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...!\n"); } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();
输出
获得的输出结果如下所示:
Table created successfully...! First record inserted successfully...! Second record inserted successfully...! Duplicate record inserted successfully using IGNORE keyword...! Table records: First Name: Thomas, Last name: Jay, Sex: First Name: Smith, Last name: John, Sex: PHP Fatal error: Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D:\test\handlingduplicates.php:48
var mysql = require('mysql2'); var con = mysql.createConnection({ host:"localhost", user:"root", password:"password" }); //Connecting to MySQL con.connect(function(err) { if (err) throw err; // console.log("Connected successfully...!"); // console.log("--------------------------"); sql = "USE TUTORIALS"; con.query(sql); //create table sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; con.query(sql, function(err, result){ if (err) throw err; console.log("Table created successfully....!"); }); //now let's insert some records sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql, function(err, result){ if (err) throw err; console.log("First record inserted successfully...!"); }); sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; con.query(sql, function(err, result){ if (err) throw err; console.log("Second record inserted successfully...!"); }); //now lets insert duplicate record with IGNORE keyword sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; con.query(sql, function(err, result){ console.log("Insert duplicate record with IGNORE keyword") if (err) throw err; console.log("Duplicate record inserted successfully with the help of IGNORE keyword"); }); //lets insert a duplicate record sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; con.query(sql, function(err, result){ console.log("Insert duplicate record"); if (err) throw err; console.log("You can't insert the duplicate record because columns are primary key"); }); sql = "SELECT * FROM person_tbl"; con.query(sql, function(err, result){ console.log("Table records(with ID auto_increment sequence)."); if (err) throw err; console.log(result); }); });
输出
获得的输出结果如下所示:
Table created successfully....! First record inserted successfully...! Second record inserted successfully...! Insert duplicate record with IGNORE keyword Duplicate record inserted successfully with the help of IGNORE keyword Insert duplicate record Insert duplicate record D:\test1\duplicate.js:43 if (err) throw err; ^ Error: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class HandleDuplicates { public static void main(String[] args) { String url = "jdbc:mysql://127.0.0.1:3306/TUTORIALS"; String user = "root"; String password = "password"; ResultSet rs; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection con = DriverManager.getConnection(url, user, password); Statement st = con.createStatement(); //System.out.println("Database connected successfully...!"); //create table String sql = "CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; st.execute(sql); System.out.println("Table created successfully....!"); //let's insert some records String sql1 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; st.execute(sql1); System.out.println("Record inserted successfully...!"); String sql2 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; st.execute(sql2); System.out.println("Duplicate record inserted successfully...!"); //lets print the table records String sql3 = "SELECT * FROM PERSON_TBL"; rs = st.executeQuery(sql3); System.out.println("Table records: "); while(rs.next()) { String fname = rs.getString("first_name"); String lname = rs.getString("last_name"); System.out.println("First name: " + fname + ", Last name: " + lname); } }catch(Exception e) { e.printStackTrace(); } } }
输出
获得的输出结果如下所示:
Table created successfully....! Record inserted successfully...! Duplicate record inserted successfully...! Table records: First name: Thomas, Last name: Jay
import mysql.connector # Connecting to MySQL con = mysql.connector.connect( host="localhost", user="root", password="password", database="tut" ) # Creating a cursor object cursorObj = con.cursor() # Creating the table create_table_query = """ CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name) ) """ cursorObj.execute(create_table_query) print("Table 'person_tbl' is created successfully!") # Inserting some records first_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" print("First record inserted successfully!") cursorObj.execute(first_record) second_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')" print("Second record inserted successfully!") cursorObj.execute(second_record) # Insert duplicate record with IGNORE keyword sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')" print("Duplicate record inserted successfully with the help of IGNORE keyword") cursorObj.execute(sql) # Insert a duplicate record (this will throw an error) try: cursorObj.execute("INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')") print("Insert duplicate record") except mysql.connector.Error as err: print("Insert duplicate record error:", err) con.commit() # Retrieving records cursorObj.execute("SELECT * FROM person_tbl") records = cursorObj.fetchall() # Printing the records print("Table records.") for record in records: print(record) # Closing the connection cursorObj.close() con.close()
输出
获得的输出结果如下所示:
First record inserted successfully! Second record inserted successfully! Duplicate record inserted successfully with the help of IGNORE keyword Insert duplicate record error: 1062 (23000): Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' Table records. ('Thomas', 'Jay', None) ('Smith', 'John', None)