MySQL - 删除重复记录



MySQL 删除重复记录

数据库(包括 MySQL)中的重复记录非常常见。MySQL 数据库以包含行和列的表的形式存储数据。现在,当数据库表中的两行或多行具有相同的值时,该记录被认为是重复的。

这种冗余可能由于各种原因而发生:

  • 该行可能被插入两次。
  • 从外部来源导入原始数据时。
  • 数据库应用程序中可能存在错误。

无论原因是什么,删除这种冗余对于提高数据准确性、减少错误或提高数据库性能效率都非常重要。

查找重复值

在删除重复记录之前,我们必须找出它们是否存在于表中。可以使用以下方法:

  • GROUP BY 子句

  • COUNT() 方法

示例

让我们首先创建一个名为“CUSTOMERS”的表,其中包含重复值:

Open Compiler
CREATE TABLE CUSTOMERS( ID int, NAME varchar(100) );

使用以下 INSERT 查询,将一些记录插入到“CUSTOMERS”表中。在这里,我们添加了“John”作为重复记录 3 次:

Open Compiler
INSERT INTO CUSTOMERS VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John');

获得的 CUSTOMERS 表如下所示:

id name
1 John
2 Johnson
3 John
4 John

现在,我们使用 COUNT() 方法和 GROUP BY 子句检索表中重复的记录,如下面的查询所示:

Open Compiler
SELECT NAME, COUNT(NAME) FROM CUSTOMERS GROUP BY NAME HAVING COUNT(NAME) > 1;

输出

获得的输出如下所示:

NAME COUNT(NAME)
John 3

Learn MySQL in-depth with real-world projects through our MySQL certification course. Enroll and become a certified expert to boost your career.

删除重复记录

要从数据库表中删除重复记录,我们可以使用 DELETE 命令。但是,此 DELETE 命令可以使用两种方法从表中删除重复项:

  • 使用 DELETE... JOIN

  • 使用 ROW_NUMBER() 函数

使用 DELETE... JOIN

为了使用 DELETE... JOIN 命令从表中删除重复记录,我们对其自身执行内部连接。这适用于并非完全相同的案例。

例如,假设客户记录中存在客户详细信息的重复,但序列号不断递增。在这里,即使 ID 不相同,记录也是重复的。

示例

在下面的查询中,我们使用前面创建的 CUSTOMERS 表来使用 DELETE... JOIN 命令删除重复记录:

DELETE t1 FROM CUSTOMERS t1 INNER JOIN CUSTOMERS t2 WHERE t1.id < t2.id AND t1.name = t2.name;

输出

获得的输出如下所示:

Query OK, 2 rows affected (0.01 sec)

验证

我们可以使用以下 SELECT 语句验证是否已删除重复记录:

SELECT * FROM CUSTOMERS;

我们可以从获得的表中看到,该查询删除了重复项,并在表中保留了不同的记录:

ID NAME
2 Johnson
4 John

使用 ROW_NUMBER() 函数

MySQL 中的 ROW_NUMBER() 函数用于为从查询获得的结果集中的每一行分配一个从 1 开始的顺序号。

使用此函数,MySQL 允许您检测重复行,可以使用 DELETE 语句将其删除。

示例

在这里,我们将 ROW_NUMBER() 函数应用于在“NAME”列中具有重复值的 CUSTOMERS 表。我们将使用以下查询基于“NAME”列在分区内分配行号:

Open Compiler
SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM CUSTOMERS;

获得的输出如下所示:

id row_num
1 1
3 2
4 3
2 1

现在,使用以下语句删除重复行(行号大于 1 的行):

Open Compiler
DELETE FROM CUSTOMERS WHERE id IN( SELECT id FROM (SELECT id, ROW_NUMBER() OVER (PARTITION BY name ORDER BY name) AS row_num FROM CUSTOMERS) AS temp_table WHERE row_num>1 );

我们得到如下所示的输出:

Query OK, 2 rows affected (0.00 sec)

要验证是否已删除重复记录,请使用以下 SELECT 查询:

Open Compiler
SELECT * FROM CUSTOMERS;

产生的结果如下所示:

ID NAME
1 John
2 Johnson

使用客户端程序删除重复记录

我们还可以使用客户端程序删除重复记录。

语法

要通过PHP程序删除重复记录,需要使用**mysqli**函数**query()**执行包含“DELETE”命令的内连接,如下所示:

$sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; $mysqli->query($sql);

要通过JavaScript程序删除重复记录,需要使用**mysql2**库的**query()**函数执行包含“DELETE”命令的内连接,如下所示:

sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; con.query(sql)

要通过Java程序删除重复记录,需要使用**JDBC**函数**execute()**执行包含“DELETE”命令的内连接,如下所示:

String sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; statement.execute(sql);

要通过Python程序删除重复记录,需要使用**MySQL Connector/Python**的**execute()**函数执行包含“DELETE”命令的内连接,如下所示:

delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name" cursorObj.execute(delete_query)

示例

以下是程序示例:

$dbhost = 'localhost'; $dbuser = 'root'; $dbpass = 'password'; $db = 'TUTORIALS'; $mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db); if ($mysqli->connect_errno) { printf("Connect failed: %s", $mysqli->connect_error); exit(); } //printf('Connected successfully.'); //let's create a table $sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))"; if($mysqli->query($sql)){ printf("DuplicateDeleteDemo table created successfully...!\n"); } //now lets insert some duplicate records; $sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(2,'Johnson')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(3,'John')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(4,'John')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...!\n"); } //display the table records $sql = "SELECT * FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records(before deleting): \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, NAME %s", $row['ID'], $row['NAME']); printf("\n"); } } //now lets count duplicate records $sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1"; if($result = $mysqli->query($sql)){ printf("Duplicate records: \n"); while($row = mysqli_fetch_array($result)){ print_r($row); } } //lets delete dupliacte records $sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; if($mysqli->query($sql)){ printf("Duplicate records deleted successfully...!\n"); } $sql = "SELECT ID, NAME FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records after deleting: \n"); while($row = mysqli_fetch_row($result)){ print_r($row); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

输出

获得的输出结果如下所示:

DuplicateDeleteDemo table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Third records inserted successfully...!
Fourth record inserted successfully...!
Table records(before deleting):
ID: 1, NAME John
ID: 2, NAME Johnson
ID: 3, NAME John
ID: 4, NAME John
Duplicate records:
Array
(
    [0] => John
    [NAME] => John
    [1] => 3
    [COUNT(NAME)] => 3
)
Duplicate records deleted successfully...!
Table records after deleting:
Array
(
    [0] => 2
    [1] => Johnson
)
Array
(
    [0] => 4
    [1] => John
)    

var mysql = require('mysql2'); var con = mysql.createConnection({ host: "localhost", user: "root", password: "Nr5a0204@123" }); // Connecting to MySQL con.connect(function (err) { if (err) throw err; console.log("Connected!"); console.log("--------------------------"); // Create a new database sql = "Create Database TUTORIALS"; con.query(sql); sql = "USE TUTORIALS"; con.query(sql); sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100));" con.query(sql); sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John'),(2,'Johnson'),(3,'John'),(4,'John');" con.query(sql); sql = "SELECT * FROM DuplicateDeleteDemo;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records of DuplicateDeleteDemo Table:**"); console.log(result); console.log("--------------------------"); }); //Fetching records that are duplicated in the table sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records that are duplicated in the table:**"); console.log(result); console.log("--------------------------"); }); sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; con.query(sql); sql = "SELECT * FROM DuplicateDeleteDemo;" con.query(sql, function(err, result){ if (err) throw err console.log("**Records after deleting Duplicates:**"); console.log(result); }); });

输出

获得的输出结果如下所示:

 
Connected!
--------------------------
**Records of DuplicateDeleteDemo Table:**
[
  { ID: 1, NAME: 'John' },
  { ID: 2, NAME: 'Johnson' },
  { ID: 3, NAME: 'John' },
  { ID: 4, NAME: 'John' }
]
--------------------------
**Records that are duplicated in the table:**
[ { NAME: 'John', 'COUNT(NAME)': 3 } ]
--------------------------
**Records after deleting Duplicates:**
[ { ID: 2, NAME: 'Johnson' }, { ID: 4, NAME: 'John' } ]
import java.sql.Connection; import java.sql.DriverManager; import java.sql.ResultSet; import java.sql.Statement; public class DeleteDuplicates { public static void main(String[] args) { String url = "jdbc:mysql://localhost:3306/TUTORIALS"; String user = "root"; String password = "password"; ResultSet rs; try { Class.forName("com.mysql.cj.jdbc.Driver"); Connection con = DriverManager.getConnection(url, user, password); Statement st = con.createStatement(); //System.out.println("Database connected successfully...!"); String sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))"; st.execute(sql); System.out.println("Table DuplicateDeleteDemo created successfully...!"); //let's insert some records into it... String sql1 = "INSERT INTO DuplicateDeleteDemo VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John')"; st.execute(sql1); System.out.println("Records inserted successfully....!"); //print table records String sql2 = "SELECT * FROM DuplicateDeleteDemo"; rs = st.executeQuery(sql2); System.out.println("Table records(before deleting the duplicate rcords): "); while(rs.next()) { String id = rs.getString("id"); String name = rs.getString("name"); System.out.println("Id: " + id + ", Name: " + name); } //let delete duplicate records using delete join String sql3 = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; st.execute(sql3); System.out.println("Duplicate records deleted successfully....!"); String sql4 = "SELECT * FROM DuplicateDeleteDemo"; rs = st.executeQuery(sql4); System.out.println("Table records(after deleting the duplicate rcords): "); while(rs.next()) { String id = rs.getString("id"); String name = rs.getString("name"); System.out.println("Id: " + id + ", Name: " + name); } }catch(Exception e) { e.printStackTrace(); } } }

输出

获得的输出结果如下所示:

Table DuplicateDeleteDemo created successfully...!
Records inserted successfully....!
Table records(before deleting the duplicate rcords): 
Id: 1, Name: John
Id: 2, Name: Johnson
Id: 3, Name: John
Id: 4, Name: John
Duplicate records deleted successfully....!
Table records(after deleting the duplicate rcords): 
Id: 2, Name: Johnson
Id: 4, Name: John
import mysql.connector # Establishing the connection connection = mysql.connector.connect( host='localhost', user='root', password='password', database='tut' ) # Creating a cursor object cursorObj = connection.cursor() # Creating the table 'DuplicateDeleteDemo' create_table_query = '''CREATE TABLE DuplicateDeleteDemo(ID int, NAME varchar(100))''' cursorObj.execute(create_table_query) print("Table 'DuplicateDeleteDemo' is created successfully!") # Inserting records into 'DuplicateDeleteDemo' table sql = "INSERT INTO DuplicateDeleteDemo (ID, NAME) VALUES (%s, %s);" values = [(1, 'John'), (2, 'Johnson'), (3, 'John'), (4, 'John')] cursorObj.executemany(sql, values) print("Values inserted successfully") # Display table display_table = "SELECT * FROM DuplicateDeleteDemo;" cursorObj.execute(display_table) # Printing the table 'DuplicateDeleteDemo' results = cursorObj.fetchall() print("\nDuplicateDeleteDemo Table:") for result in results: print(result) # Retrieve the duplicate records duplicate_records_query = """ SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1; """ cursorObj.execute(duplicate_records_query) dup_rec = cursorObj.fetchall() print("\nDuplicate records:") for record in dup_rec: print(record) # Delete duplicate records delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name" cursorObj.execute(delete_query) print("Duplicate records deleted successfully") # Verification display_table_after_delete = "SELECT * FROM DuplicateDeleteDemo;" cursorObj.execute(display_table_after_delete) results_after_delete = cursorObj.fetchall() print("\nDuplicateDeleteDemo Table (After Delete):") for result in results_after_delete: print(result) # Closing the cursor and connection cursorObj.close() connection.close()

输出

获得的输出结果如下所示:

Table 'DuplicateDeleteDemo' is created successfully!
Values inserted successfully

DuplicateDeleteDemo Table:
(1, 'John')
(2, 'Johnson')
(3, 'John')
(4, 'John')

Duplicate records:
('John', 3)
Duplicate records deleted successfully

DuplicateDeleteDemo Table (After Delete):
(2, 'Johnson')
(4, 'John')    
广告