MySQL - 删除重复记录



MySQL 删除重复记录

数据库(包括 MySQL)中的重复记录非常常见。MySQL 数据库以包含行和列的表的形式存储数据。现在,当数据库表中的两行或多行具有相同的值时,该记录被认为是重复的。

这种冗余可能由于各种原因而发生:

  • 该行可能被插入两次。
  • 从外部来源导入原始数据时。
  • 数据库应用程序中可能存在错误。

无论原因是什么,删除这种冗余对于提高数据准确性、减少错误或提高数据库性能效率都非常重要。

查找重复值

在删除重复记录之前,我们必须找出它们是否存在于表中。可以使用以下方法:

  • GROUP BY 子句

  • COUNT() 方法

示例

让我们首先创建一个名为“CUSTOMERS”的表,其中包含重复值:

CREATE TABLE CUSTOMERS(
   ID int,
   NAME varchar(100)
);

使用以下 INSERT 查询,将一些记录插入到“CUSTOMERS”表中。在这里,我们添加了“John”作为重复记录 3 次:

INSERT INTO CUSTOMERS 
VALUES (1,'John'), (2,'Johnson'), (3,'John'), (4,'John');

获得的 CUSTOMERS 表如下所示:

id name
1 John
2 Johnson
3 John
4 John

现在,我们使用 COUNT() 方法和 GROUP BY 子句检索表中重复的记录,如下面的查询所示:

SELECT NAME, COUNT(NAME) FROM CUSTOMERS
GROUP BY NAME HAVING COUNT(NAME) > 1;

输出

获得的输出如下所示:

NAME COUNT(NAME)
John 3

删除重复记录

要从数据库表中删除重复记录,我们可以使用 DELETE 命令。但是,此 DELETE 命令可以使用两种方法从表中删除重复项:

  • 使用 DELETE... JOIN

  • 使用 ROW_NUMBER() 函数

使用 DELETE... JOIN

为了使用 DELETE... JOIN 命令从表中删除重复记录,我们对其自身执行内部连接。这适用于并非完全相同的案例。

例如,假设客户记录中存在客户详细信息的重复,但序列号不断递增。在这里,即使 ID 不相同,记录也是重复的。

示例

在下面的查询中,我们使用前面创建的 CUSTOMERS 表来使用 DELETE... JOIN 命令删除重复记录:

DELETE t1 FROM CUSTOMERS t1
INNER JOIN CUSTOMERS t2
WHERE t1.id < t2.id AND t1.name = t2.name;

输出

获得的输出如下所示:

Query OK, 2 rows affected (0.01 sec)

验证

我们可以使用以下 SELECT 语句验证是否已删除重复记录:

SELECT * FROM CUSTOMERS;

我们可以从获得的表中看到,该查询删除了重复项,并在表中保留了不同的记录:

ID NAME
2 Johnson
4 John

使用 ROW_NUMBER() 函数

MySQL 中的 ROW_NUMBER() 函数用于为从查询获得的结果集中的每一行分配一个从 1 开始的顺序号。

使用此函数,MySQL 允许您检测重复行,可以使用 DELETE 语句将其删除。

示例

在这里,我们将 ROW_NUMBER() 函数应用于在“NAME”列中具有重复值的 CUSTOMERS 表。我们将使用以下查询基于“NAME”列在分区内分配行号:

SELECT id, ROW_NUMBER()
OVER (PARTITION BY name ORDER BY name) AS row_num
FROM CUSTOMERS;

获得的输出如下所示:

id row_num
1 1
3 2
4 3
2 1

现在,使用以下语句删除重复行(行号大于 1 的行):

DELETE FROM CUSTOMERS WHERE id IN(
   SELECT id FROM (SELECT id, ROW_NUMBER()
   OVER (PARTITION BY name ORDER BY name) AS row_num
   FROM CUSTOMERS) AS temp_table WHERE row_num>1
);

我们得到如下所示的输出:

Query OK, 2 rows affected (0.00 sec)

要验证是否已删除重复记录,请使用以下 SELECT 查询:

SELECT * FROM CUSTOMERS;

产生的结果如下所示:

ID NAME
1 John
2 Johnson

使用客户端程序删除重复记录

我们还可以使用客户端程序删除重复记录。

语法

要通过PHP程序删除重复记录,需要使用**mysqli**函数**query()**执行包含“DELETE”命令的内连接,如下所示:

$sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name";
$mysqli->query($sql);

要通过JavaScript程序删除重复记录,需要使用**mysql2**库的**query()**函数执行包含“DELETE”命令的内连接,如下所示:

sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name";
con.query(sql)

要通过Java程序删除重复记录,需要使用**JDBC**函数**execute()**执行包含“DELETE”命令的内连接,如下所示:

String sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name";
statement.execute(sql);

要通过Python程序删除重复记录,需要使用**MySQL Connector/Python**的**execute()**函数执行包含“DELETE”命令的内连接,如下所示:

delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"
cursorObj.execute(delete_query)

示例

以下是程序示例:

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); //let's create a table $sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))"; if($mysqli->query($sql)){ printf("DuplicateDeleteDemo table created successfully...!\n"); } //now lets insert some duplicate records; $sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(2,'Johnson')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(3,'John')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...!\n"); } $sql = "INSERT INTO DuplicateDeleteDemo VALUES(4,'John')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...!\n"); } //display the table records $sql = "SELECT * FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records(before deleting): \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, NAME %s", $row['ID'], $row['NAME']); printf("\n"); } } //now lets count duplicate records $sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1"; if($result = $mysqli->query($sql)){ printf("Duplicate records: \n"); while($row = mysqli_fetch_array($result)){ print_r($row); } } //lets delete dupliacte records $sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"; if($mysqli->query($sql)){ printf("Duplicate records deleted successfully...!\n"); } $sql = "SELECT ID, NAME FROM DuplicateDeleteDemo"; if($result = $mysqli->query($sql)){ printf("Table records after deleting: \n"); while($row = mysqli_fetch_row($result)){ print_r($row); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

输出

获得的输出结果如下所示:

DuplicateDeleteDemo table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Third records inserted successfully...!
Fourth record inserted successfully...!
Table records(before deleting):
ID: 1, NAME John
ID: 2, NAME Johnson
ID: 3, NAME John
ID: 4, NAME John
Duplicate records:
Array
(
    [0] => John
    [NAME] => John
    [1] => 3
    [COUNT(NAME)] => 3
)
Duplicate records deleted successfully...!
Table records after deleting:
Array
(
    [0] => 2
    [1] => Johnson
)
Array
(
    [0] => 4
    [1] => John
)    

var mysql = require('mysql2');
var con = mysql.createConnection({
    host: "localhost",
    user: "root",
    password: "Nr5a0204@123"
});

// Connecting to MySQL
con.connect(function (err) {
    if (err) throw err;
    console.log("Connected!");
    console.log("--------------------------");

    // Create a new database
    sql = "Create Database TUTORIALS";
    con.query(sql);

    sql = "USE TUTORIALS";
    con.query(sql);

    sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100));"
    con.query(sql);

    sql = "INSERT INTO DuplicateDeleteDemo VALUES(1,'John'),(2,'Johnson'),(3,'John'),(4,'John');"
    con.query(sql);

    sql = "SELECT * FROM DuplicateDeleteDemo;"
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Records of DuplicateDeleteDemo Table:**");
      console.log(result);
      console.log("--------------------------");
    });

    //Fetching records that are duplicated in the table
    sql = "SELECT NAME, COUNT(NAME) FROM DuplicateDeleteDemo GROUP BY NAME HAVING COUNT(NAME) > 1;"
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Records that are duplicated in the table:**");
      console.log(result);
      console.log("--------------------------");
    });

    sql = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name";
    con.query(sql);

    sql = "SELECT * FROM DuplicateDeleteDemo;"
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Records after deleting Duplicates:**");
      console.log(result);
    });
});    

输出

获得的输出结果如下所示:

 
Connected!
--------------------------
**Records of DuplicateDeleteDemo Table:**
[
  { ID: 1, NAME: 'John' },
  { ID: 2, NAME: 'Johnson' },
  { ID: 3, NAME: 'John' },
  { ID: 4, NAME: 'John' }
]
--------------------------
**Records that are duplicated in the table:**
[ { NAME: 'John', 'COUNT(NAME)': 3 } ]
--------------------------
**Records after deleting Duplicates:**
[ { ID: 2, NAME: 'Johnson' }, { ID: 4, NAME: 'John' } ]
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class DeleteDuplicates {
    public static void main(String[] args) {
        String url = "jdbc:mysql://127.0.0.1:3306/TUTORIALS";
        String user = "root";
        String password = "password";
        ResultSet rs;
        try {
            Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            String sql = "CREATE TABLE DuplicateDeleteDemo(ID int,NAME varchar(100))";
            st.execute(sql);
            System.out.println("Table DuplicateDeleteDemo created successfully...!");
            //let's insert some records into it...
            String sql1 = "INSERT INTO DuplicateDeleteDemo VALUES (1,'John'),  (2,'Johnson'), (3,'John'), (4,'John')";
            st.execute(sql1);
            System.out.println("Records inserted successfully....!");
            //print table records
            String sql2 = "SELECT * FROM DuplicateDeleteDemo";
            rs = st.executeQuery(sql2);
            System.out.println("Table records(before deleting the duplicate rcords): ");
            while(rs.next()) {
                String id = rs.getString("id");
                String name = rs.getString("name");
                System.out.println("Id: " + id + ", Name: " + name);
            }
            //let delete duplicate records using delete join
            String sql3 = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name";
            st.execute(sql3);
            System.out.println("Duplicate records deleted successfully....!");
            String sql4 = "SELECT * FROM DuplicateDeleteDemo";
            rs = st.executeQuery(sql4);
            System.out.println("Table records(after deleting the duplicate rcords): ");
            while(rs.next()) {
                String id = rs.getString("id");
                String name = rs.getString("name");
                System.out.println("Id: " + id + ", Name: " + name);
            }
        }catch(Exception e) {
            e.printStackTrace();
        }
    }
}    

输出

获得的输出结果如下所示:

Table DuplicateDeleteDemo created successfully...!
Records inserted successfully....!
Table records(before deleting the duplicate rcords): 
Id: 1, Name: John
Id: 2, Name: Johnson
Id: 3, Name: John
Id: 4, Name: John
Duplicate records deleted successfully....!
Table records(after deleting the duplicate rcords): 
Id: 2, Name: Johnson
Id: 4, Name: John
import mysql.connector
# Establishing the connection
connection = mysql.connector.connect(
    host='localhost',
    user='root',
    password='password',
    database='tut'
)
# Creating a cursor object
cursorObj = connection.cursor()
# Creating the table 'DuplicateDeleteDemo'
create_table_query = '''CREATE TABLE DuplicateDeleteDemo(ID int, NAME varchar(100))'''
cursorObj.execute(create_table_query)
print("Table 'DuplicateDeleteDemo' is created successfully!")
# Inserting records into 'DuplicateDeleteDemo' table
sql = "INSERT INTO DuplicateDeleteDemo (ID, NAME) VALUES (%s, %s);"
values = [(1, 'John'), (2, 'Johnson'), (3, 'John'), (4, 'John')]
cursorObj.executemany(sql, values)
print("Values inserted successfully")
# Display table
display_table = "SELECT * FROM DuplicateDeleteDemo;"
cursorObj.execute(display_table)
# Printing the table 'DuplicateDeleteDemo'
results = cursorObj.fetchall()
print("\nDuplicateDeleteDemo Table:")
for result in results:
    print(result)
# Retrieve the duplicate records
duplicate_records_query = """
SELECT NAME,
COUNT(NAME)
FROM DuplicateDeleteDemo
GROUP BY NAME
HAVING COUNT(NAME) > 1;
"""
cursorObj.execute(duplicate_records_query)
dup_rec = cursorObj.fetchall()
print("\nDuplicate records:")
for record in dup_rec:
    print(record)
# Delete duplicate records
delete_query = "DELETE t1 FROM DuplicateDeleteDemo t1 INNER JOIN DuplicateDeleteDemo t2 WHERE t1.id < t2.id AND t1.name = t2.name"
cursorObj.execute(delete_query)
print("Duplicate records deleted successfully")
# Verification
display_table_after_delete = "SELECT * FROM DuplicateDeleteDemo;"
cursorObj.execute(display_table_after_delete)
results_after_delete = cursorObj.fetchall()
print("\nDuplicateDeleteDemo Table (After Delete):")
for result in results_after_delete:
    print(result)
# Closing the cursor and connection
cursorObj.close()
connection.close()

输出

获得的输出结果如下所示:

Table 'DuplicateDeleteDemo' is created successfully!
Values inserted successfully

DuplicateDeleteDemo Table:
(1, 'John')
(2, 'Johnson')
(3, 'John')
(4, 'John')

Duplicate records:
('John', 3)
Duplicate records deleted successfully

DuplicateDeleteDemo Table (After Delete):
(2, 'Johnson')
(4, 'John')    
广告