MySQL - 处理重复数据



数据库中的表或结果集通常包含重复记录。虽然通常允许重复,但在某些情况下需要防止重复。在这种情况下,必须从数据库表中识别并删除重复记录。

处理 MySQL 重复数据的重要性

处理数据库中重复数据的原因有很多。主要原因之一是,组织数据库中存在重复数据会导致逻辑错误。此外,我们需要处理冗余数据以防止以下后果:

  • 重复数据占用存储空间,降低数据库使用效率并增加存储成本。
  • 处理重复记录会消耗额外的资源,从而增加维护数据库的总成本。
  • 数据库中的重复数据会导致数据逻辑错误,影响存储信息的完整性和可靠性。

防止重复条目

您可以使用表中具有相应字段的主键 (PRIMARY KEY)唯一 (UNIQUE) 索引来防止将重复记录条目插入表中。

示例

下表不包含此类索引或主键,因此它允许first_namelast_name 的重复记录。

CREATE TABLE CUSTOMERS (
   first_name CHAR(20),
   last_name CHAR(20),
   sex CHAR(10)
);

为防止在此表中创建具有相同 first_name 和 last_name 值的多个记录,请向其定义中添加主键 (PRIMARY KEY)。执行此操作时,还需要将索引列声明为NOT NULL,因为主键 (PRIMARY KEY) 不允许NULL 值:

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   PRIMARY KEY (last_name, first_name)
);

使用 INSERT IGNORE 查询:

表中存在唯一索引通常会在尝试插入与索引列中现有记录重复的记录时导致错误。

为了处理这种情况而不产生错误,您可以使用“INSERT IGNORE”命令。当记录不是重复记录时,MySQL 会照常插入它。但是,如果记录是重复的,“IGNORE”关键字会指示 MySQL 丢弃它,而不会产生错误。

提供的示例不会导致错误,并且它还确保不会插入重复记录:

INSERT IGNORE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ('Jay', 'Thomas'), ('Jay', 'Thomas');

我们将获得以下输出:

Query OK, 1 row affected, 1 warning (0.01 sec)
Records: 2  Duplicates: 1  Warnings: 1

使用 REPLACE 查询:

考虑使用 REPLACE 命令代替 INSERT 命令。处理新记录时,它会像 INSERT 一样插入。但是,如果它是重复的,新记录将替换旧记录。

REPLACE INTO CUSTOMERS (LAST_NAME, FIRST_NAME) 
VALUES ( 'Ajay', 'Kumar'), ( 'Ajay', 'Kumar');

以下是上述代码的输出:

Query OK, 2 rows affected (0.01 sec)
Records: 2  Duplicates: 0  Warnings: 0

您在 INSERT IGNORE 和 REPLACE 命令之间的选择应取决于您希望实现的特定重复处理行为。INSERT IGNORE 命令保留第一组重复记录并丢弃其余记录。另一方面,REPLACE 命令保留最后一组重复项并删除任何较早的实例。

使用 UNIQUE 约束:

强制唯一性的另一种方法是向表添加唯一 (UNIQUE) 索引,而不是主键 (PRIMARY KEY):

CREATE TABLE CUSTOMERS (
   first_name CHAR(20) NOT NULL,
   last_name CHAR(20) NOT NULL,
   sex CHAR(10),
   UNIQUE (last_name, first_name)
);

计数和识别重复项

您可以使用 COUNT 函数和 GROUP BY 子句根据特定列计数和识别重复记录。

示例

以下是计算表中 first_name 和 last_name 的重复记录的查询:

SELECT COUNT(*) as repetitions, last_name, first_name
FROM CUSTOMERS
GROUP BY last_name, first_name
HAVING repetitions > 1;

此查询将返回 CUSTOMERS 表中所有重复记录的列表。一般来说,要识别重复的值集,请遵循以下步骤。

  • 确定哪些列可能包含重复值。

  • 将这些列包含在列选择列表中,以及 COUNT(*)。

  • 还在 GROUP BY 子句中列出这些列。

  • 应用 HAVING 子句以通过要求组计数大于一来过滤唯一值。

从查询结果中消除重复项

您可以将 DISTINCT 命令与 SELECT 语句一起使用来查找表中可用的唯一记录。

SELECT DISTINCT last_name, first_name
FROM CUSTOMERS
ORDER BY last_name;

除了DISTINCT命令,还可以添加GROUP BY子句,指定要选择的列。这种方法可以消除重复项,只检索指定列中唯一的值组合。

SELECT last_name, first_name
FROM CUSTOMERS
GROUP BY (last_name, first_name);

使用表替换删除重复项

如果表中存在重复记录,并且想要删除表中的所有重复记录,请按照以下步骤操作:

CREATE TABLE tmp AS 
SELECT DISTINCT last_name, first_name, sex
FROM CUSTOMERS;

DROP TABLE CUSTOMERS;
ALTER TABLE tmp RENAME TO CUSTOMERS;

使用客户端程序处理重复项

我们也可以使用客户端程序来处理重复数据。

语法

要通过PHP程序处理重复值,需要使用mysqli函数query()执行“INSERT IGNORE”语句,如下所示:

$sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
$mysqli->query($sql);

要通过JavaScript程序处理重复值,需要使用mysql2库的query()函数执行“INSERT IGNORE”语句,如下所示:

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql);

要通过Java程序处理重复值,需要使用JDBC函数execute()执行“INSERT IGNORE”语句,如下所示:

String sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
statement.execute(sql);

要通过Python程序处理重复值,需要使用MySQL Connector/Pythonexecute()函数执行“INSERT IGNORE”语句,如下所示:

sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
cursorObj.execute(sql)

示例

以下是程序示例:

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); $sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))"; if($mysqli->query($sql)){ printf("Table created successfully...!\n"); } //let's insert some records $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } //now lets insert duplicate record with IGNORE keyword $sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"; if($mysqli->query($sql)){ printf("Duplicate record inserted successfully using IGNORE keyword...!\n"); } $sql = "SELECT * from person_tbl"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("First Name: %s, Last name: %s, Sex: %s", $row['first_name'], $row['last_name'], $row['sex']); printf("\n"); } } //lets insert a duplicate record $sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"; if(!$mysqli->query($sql)){ printf("You can't insert any duplicate records...!\n"); } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

输出

获得的输出结果如下所示:

Table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Duplicate record inserted successfully using IGNORE keyword...!
Table records:
First Name: Thomas, Last name: Jay, Sex:
First Name: Smith, Last name: John, Sex:
PHP Fatal error:  Uncaught mysqli_sql_exception: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY' in D:\test\handlingduplicates.php:48    
var mysql = require('mysql2');
var con = mysql.createConnection({
host:"localhost",
user:"root",
password:"password"
});

 //Connecting to MySQL
 con.connect(function(err) {
 if (err) throw err;
//   console.log("Connected successfully...!");
//   console.log("--------------------------");
 sql = "USE TUTORIALS";
 con.query(sql);
 //create table
 sql = "CREATE TABLE person_tbl (first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
 con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Table created successfully....!");
    });
//now let's insert some records
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("First record inserted successfully...!");
    });
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
con.query(sql, function(err, result){
    if (err) throw err;
    console.log("Second record inserted successfully...!");
    });
//now lets insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record with IGNORE keyword")
    if (err) throw err;
    console.log("Duplicate record inserted successfully with the help of IGNORE keyword");
    });
//lets insert a duplicate record
sql = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')";
    con.query(sql, function(err, result){
        console.log("Insert duplicate record");
    if (err) throw err;
    console.log("You can't insert the duplicate record because columns are primary key");
    });
sql = "SELECT * FROM person_tbl";
con.query(sql, function(err, result){
    console.log("Table records(with ID auto_increment sequence).");
    if (err) throw err;
    console.log(result);
    });
});   

输出

获得的输出结果如下所示:

Table created successfully....!
First record inserted successfully...!
Second record inserted successfully...!
Insert duplicate record with IGNORE keyword
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record
Insert duplicate record
D:\test1\duplicate.js:43
    if (err) throw err;              ^

Error: Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class HandleDuplicates {
   public static void main(String[] args) {
      String url = "jdbc:mysql://127.0.0.1:3306/TUTORIALS";
      String user = "root";
      String password = "password";
      ResultSet rs;
      try {
         Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            //create table
            String sql = "CREATE TABLE person_tbl ( first_name CHAR(20) NOT NULL, last_name CHAR(20) NOT NULL, sex CHAR(10), PRIMARY KEY (last_name, first_name))";
            st.execute(sql);
            System.out.println("Table created successfully....!");
            //let's insert some records
            String sql1 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql1);
            System.out.println("Record inserted successfully...!");
            String sql2 = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')";
            st.execute(sql2);
            System.out.println("Duplicate record inserted successfully...!");
            //lets print the table records
            String sql3 = "SELECT * FROM PERSON_TBL";
            rs = st.executeQuery(sql3);
            System.out.println("Table records: ");
            while(rs.next()) {
               String fname = rs.getString("first_name");
               String lname = rs.getString("last_name");
               System.out.println("First name: " + fname + ", Last name: " + lname);
            }
      }catch(Exception e) {
         e.printStackTrace();
      }
   }
}   

输出

获得的输出结果如下所示:

Table created successfully....!
Record inserted successfully...!
Duplicate record inserted successfully...!
Table records: 
First name: Thomas, Last name: Jay
import mysql.connector
# Connecting to MySQL
con = mysql.connector.connect(
    host="localhost",
    user="root",
    password="password",
    database="tut"
)
# Creating a cursor object
cursorObj = con.cursor()
# Creating the table
create_table_query = """
CREATE TABLE person_tbl (
    first_name CHAR(20) NOT NULL,
    last_name CHAR(20) NOT NULL,
    sex CHAR(10),
    PRIMARY KEY (last_name, first_name)
)
"""
cursorObj.execute(create_table_query)
print("Table 'person_tbl' is created successfully!")
# Inserting some records
first_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("First record inserted successfully!")
cursorObj.execute(first_record)

second_record = "INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')"
print("Second record inserted successfully!")
cursorObj.execute(second_record)
# Insert duplicate record with IGNORE keyword
sql = "INSERT IGNORE INTO person_tbl (last_name, first_name) VALUES( 'Jay', 'Thomas')"
print("Duplicate record inserted successfully with the help of IGNORE keyword")
cursorObj.execute(sql)
# Insert a duplicate record (this will throw an error)
try:
    cursorObj.execute("INSERT INTO person_tbl (last_name, first_name) VALUES( 'John', 'Smith')")
    print("Insert duplicate record")
except mysql.connector.Error as err:
    print("Insert duplicate record error:", err)
con.commit()
# Retrieving records
cursorObj.execute("SELECT * FROM person_tbl")
records = cursorObj.fetchall()
# Printing the records
print("Table records.")
for record in records:
    print(record)
# Closing the connection
cursorObj.close()
con.close()   

输出

获得的输出结果如下所示:

First record inserted successfully!
Second record inserted successfully!
Duplicate record inserted successfully with the help of IGNORE keyword
Insert duplicate record error: 1062 (23000): Duplicate entry 'John-Smith' for key 'person_tbl.PRIMARY'
Table records.
('Thomas', 'Jay', None)
('Smith', 'John', None)
广告