MySQL - 查找重复记录



表中的重复记录会降低 MySQL 数据库的效率(通过增加执行时间、使用不必要的空间等)。因此,定位重复项对于有效地使用数据库变得必要。

但是,我们也可以通过在所需的列上添加约束(例如 PRIMARY KEY 和 UNIQUE 约束)来防止用户在表中输入重复值。

但是,由于各种原因,例如人为错误、应用程序错误或从外部资源提取的数据,如果重复项仍然输入到数据库中,则有各种方法可以找到这些记录。使用 **SQL GROUP BY** 和 **HAVING** 子句是过滤包含重复记录的常用方法之一。

查找重复记录

在查找表中的重复记录之前,我们需要定义需要重复记录的标准。您可以分两步完成此操作:

  • 首先,我们需要使用 GROUPBY 子句按要检查重复性的列对所有行进行分组。

  • 然后使用 Having 子句和 count 函数,我们需要验证上述任何形成的组是否具有多个实体。

示例

首先,让我们使用以下查询创建一个名为 CUSTOMERS 的表:

CREATE TABLE CUSTOMERS (
   ID INT NOT NULL,
   NAME VARCHAR (20) NOT NULL,
   AGE INT NOT NULL,
   ADDRESS CHAR (25),
   SALARY DECIMAL (18, 2),
   PRIMARY KEY (ID)
);

现在,让我们使用 INSERT IGNORE INTO 语句将一些重复记录插入到上面创建的表中,如下所示:

INSERT INTO CUSTOMERS VALUES
(1, 'Ramesh', 32, 'Ahmedabad', 2000.00),
(2, 'Khilan', 25, 'Delhi', 1500.00),
(3, 'Kaushik', 23, 'Kota', 2000.00),
(4, 'Chaitali', 25, 'Mumbai', 6500.00),
(5, 'Hardik', 27, 'Bhopal', 8500.00),
(6, 'Komal', 22, 'Hyderabad', 4500.00),
(7, 'Muffy', 24, 'Indore', 10000.00);

表创建如下:

ID 姓名 年龄 地址 薪资
1 Ramesh 32 Ahmedabad 2000.00
2 Khilan 25 Delhi 1500.00
3 Kaushik 23 Kota 2000.00
4 Chaitali 25 Mumbai 6500.00
5 Hardik 27 Bhopal 8500.00
6 Komal 22 Hyderabad 4500.00
7 Muffy 24 Indore 10000.00

在以下查询中,我们尝试使用 MySQL COUNT() 函数返回重复记录的数量:

SELECT SALARY, COUNT(SALARY) 
AS "COUNT" FROM CUSTOMERS
GROUP BY SALARY 
ORDER BY SALARY;

输出

上面查询的输出如下所示:

薪资 计数
1500.00 1
2000.00 2
4500.00 1
6500.00 1
8500.00 1
10000.00 1

使用 Having 子句

MySQL 中的 **HAVING** 子句可用于过滤表中一组行的条件。在这里,我们将使用 HAVING 子句与 COUNT() 函数一起查找表中一个或多个列中的重复值。

单列中的重复值

以下是查找表中单列中重复值的方法

**步骤 1:**首先,我们需要使用 GROUP BY 子句对要检查重复项的列中的所有行进行分组。

**步骤 2:**然后,要查找重复组,请在 HAVING 子句中使用 COUNT() 函数检查是否有任何组的元素超过一个。

示例

使用以下查询,我们可以找到 PETS 表中所有具有重复 DOG_NAMES 的行:

SELECT SALARY, COUNT(SALARY) 
FROM CUSTOMERS
GROUP BY SALARY
HAVING COUNT(SALARY) > 1;

输出

输出如下:

薪资 计数
2000.00 2

多列中的重复值

我们可以在 HAVING 子句中使用 AND 运算符来查找多列中的重复行。只有当列的组合是重复的时,才会认为行是重复的。

示例

在以下查询中,我们正在查找 PETS 表中在 DOG_NAME、AGE、OWNER_NAME 列中具有重复记录的行:

SELECT SALARY, COUNT(SALARY),
AGE, COUNT(AGE)
FROM CUSTOMERS
GROUP BY SALARY, AGE
HAVING  COUNT(SALARY) > 1
AND COUNT(AGE) > 1;

输出

输出如下:

薪资 计数 年龄 计数
2000.00 2 23 2

ROW_NUMBER() 函数与 PARTITION BY

在 MySQL 中,ROW_NUMBER() 函数和 PARTITION BY 子句可用于查找表中的重复记录。分区子句根据特定列或多列划分表,然后 ROW_NUMBER() 函数为每个分区中的每行分配一个唯一的行号。具有相同分区和行号的行被视为重复行。

示例

在以下查询中,我们正在分配一个

SELECT *, ROW_NUMBER() OVER (
   PARTITION BY SALARY, AGE
   ORDER BY SALARY, AGE
) AS row_numbers
FROM CUSTOMERS;

输出

上面查询的输出如下所示:

ID 姓名 年龄 地址 薪资 行号
2 Khilan 25 Delhi 1500.00 1
1 Ramesh 23 Ahmedabad 2000.00 1
3 Kaushik 23 Kota 2000.00 2
4 Chaitali 25 Mumbai 6500.00 1
5 Hardik 27 Bhopal 8500.00 1
6 Komal 22 Hyderabad 4500.00 1
7 Muffy 24 Indore 10000.00 1

使用客户端程序查找重复记录

我们还可以使用客户端程序查找重复记录。

语法

要通过 PHP 程序查找重复记录,我们需要使用 GROUP BY 子句按列对所有行进行分组,然后使用 COUNT 函数来计算重复项的数量。为此,我们需要使用 **mysqli** 函数 **query()** 执行 SELECT 语句,如下所示:

$sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
$mysqli->query($sql);

要通过 JavaScript 程序查找重复记录,我们需要使用 GROUP BY 子句按列对所有行进行分组,然后使用 COUNT 函数来计算重复项的数量。为此,我们需要使用 **mysql2** 库的 **query()** 函数执行 SELECT 语句,如下所示:

sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
con.query(sql)

要通过 Java 程序查找重复记录,我们需要使用 GROUP BY 子句按列对所有行进行分组,然后使用 COUNT 函数来计算重复项的数量。为此,我们需要使用 **JDBC** 函数 **executeQuery()** 执行 SELECT 语句,如下所示:

String sql = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY";
statement.executeQuery(sql);

要通过 Python 程序查找重复记录,我们需要使用 GROUP BY 子句按列对所有行进行分组,然后使用 COUNT 函数来计算重复项的数量。为此,我们需要使用 **MySQL Connector/Python** 的 **execute()** 函数执行 SELECT 语句,如下所示:

duplicate_records_query = "SELECT SALARY, COUNT(SALARY) AS "COUNT" FROM CUSTOMERS GROUP BY SALARY ORDER BY SALARY"
cursorObj.execute(duplicate_records_query)

示例

以下是程序示例:

$dbhost = 'localhost';
$dbuser = 'root';
$dbpass = 'password';
$db = 'TUTORIALS';
$mysqli = new mysqli($dbhost, $dbuser, $dbpass, $db);
if ($mysqli->connect_errno) {
    printf("Connect failed: %s
", $mysqli->connect_error); exit(); } //printf('Connected successfully.
'); //let's create a table $sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)"; if($mysqli->query($sql)){ printf("Pets table created successfully...!\n"); } //now lets insert some duplicate records; $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("First record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal')"; if($mysqli->query($sql)){ printf("Second record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(2, 'Harry', 2, 'Jack')"; if($mysqli->query($sql)){ printf("Third records inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(3, 'Sheero', 1, 'Rose')"; if($mysqli->query($sql)){ printf("Fourth record inserted successfully...!\n"); } $sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(4, 'Simba', 2, 'Rahul')"; if($mysqli->query($sql)){ printf("Fifth record inserted successfully...!\n"); } //display the table records $sql = "SELECT * FROM PETS"; if($result = $mysqli->query($sql)){ printf("Table records: \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d,OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME']); printf("\n"); } } //now lets group the all rows to find duplicate records... $sql = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID"; if($result = $mysqli->query($sql)){ printf("Table duplicate records: \n"); while($row = mysqli_fetch_array($result)){ printf("ID: %d, DOG_NAME %s, AGE: %d, OWNER_NAME: %s ", $row['ID'], $row['DOG_NAME'], $row['AGE'], $row['OWNER_NAME'], $row['Count']); printf("\n"); } } if($mysqli->error){ printf("Error message: ", $mysqli->error); } $mysqli->close();

输出

获得的输出结果如下所示:

Pets table created successfully...!
First record inserted successfully...!
Second record inserted successfully...!
Third records inserted successfully...!
Fourth record inserted successfully...!
Fifth record inserted successfully...!
Table records:
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack
ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose
ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul
Table duplicate records:
ID: 1, DOG_NAME Fluffy, AGE: 1,OWNER_NAME: Micheal
ID: 2, DOG_NAME Harry, AGE: 2,OWNER_NAME: Jack
ID: 3, DOG_NAME Sheero, AGE: 1,OWNER_NAME: Rose
ID: 4, DOG_NAME Simba, AGE: 2,OWNER_NAME: Rahul     

var mysql = require('mysql2');
var con = mysql.createConnection({
    host: "localhost",
    user: "root",
    password: "Nr5a0204@123"
});

// Connecting to MySQL
con.connect(function (err) {
    if (err) throw err;
    console.log("Connected!");
    console.log("--------------------------");

    // Create a new database
    sql = "Create Database TUTORIALS";
    con.query(sql);

    sql = "USE TUTORIALS";
    con.query(sql);

    //Creating TABLE table
    sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null);"
    con.query(sql);

    sql = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1,'Fluffy', 1, 'Micheal'),(1,'Fluffy', 1, 'Micheal'),(2,'Harry', 2, 'Jack'),(3,'Sheero', 1, 'Rose'),(4,'Simba', 2, 'Rahul'),(3,'Sheero', 1, 'Rose'),(3,'Sheero', 1, 'Rose');"
    con.query(sql);

    sql = "SELECT * FROM Pets;"
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Records in Pets Table**");
      console.log(result);
      console.log("--------------------------");
    });

    sql = "SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID";
    con.query(sql, function(err, result){
      if (err) throw err
      console.log("**Count of duplicate records:**");
      console.log(result);
    });
});  

输出

获得的输出结果如下所示:

 
Connected!
--------------------------
**Records in Pets Table**
[
  { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' },
  { ID: 1, DOG_NAME: 'Fluffy', AGE: 1, OWNER_NAME: 'Micheal' },
  { ID: 2, DOG_NAME: 'Harry', AGE: 2, OWNER_NAME: 'Jack' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' },
  { ID: 4, DOG_NAME: 'Simba', AGE: 2, OWNER_NAME: 'Rahul' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' },
  { ID: 3, DOG_NAME: 'Sheero', AGE: 1, OWNER_NAME: 'Rose' }
]
--------------------------
**Count of duplicate records:**
[
  { ID: 1, DOG_NAME: 'Fluffy', OWNER_NAME: 'Micheal', Count: 2 },
  { ID: 2, DOG_NAME: 'Harry', OWNER_NAME: 'Jack', Count: 1 },
  { ID: 3, DOG_NAME: 'Sheero', OWNER_NAME: 'Rose', Count: 3 },
  { ID: 4, DOG_NAME: 'Simba', OWNER_NAME: 'Rahul', Count: 1 }
]
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;
public class FindDuplicates {
  public static void main(String[] args) {
    String url = "jdbc:mysql://127.0.0.1:3306/TUTORIALS";
    String user = "root";
    String password = "password";
    ResultSet rs;
    try {
      Class.forName("com.mysql.cj.jdbc.Driver");
            Connection con = DriverManager.getConnection(url, user, password);
            Statement st = con.createStatement();
            //System.out.println("Database connected successfully...!");
            String sql = "CREATE TABLE Pets (ID int,DOG_NAME varchar(30) not null,AGE int not null,OWNER_NAME varchar(30) not null)";
            st.execute(sql);
            System.out.println("Table Pets created successfully...!");
            //let's insert some records into it...
            String sql1 = "INSERT IGNORE INTO Pets(ID, DOG_NAME, AGE, OWNER_NAME) VALUES(1, 'Fluffy', 1, 'Micheal'), (1, 'Fluffy', 1, 'Micheal'),  (3, 'Sheero', 1, 'Rose'), (4, 'Simba', 2, 'Rahul')";
            st.execute(sql1);
            System.out.println("Records inserted successfully....!");
            String sql2 = "SELECT * FROM PETS";
            rs = st.executeQuery(sql2);
            System.out.println("Table records: ");
            while(rs.next()) {
              String id = rs.getString("ID");
              String dog_name = rs.getString("DOG_NAME");
              String age = rs.getString("AGE");
              String owner_name = rs.getString("OWNER_NAME");
              System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name);
            }
            //lets find duplicate records
            String sql3 = "SELECT ID, DOG_NAME, AGE, OWNER_NAME, COUNT(*) AS 'Count' FROM PETS GROUP BY ID, DOG_NAME, OWNER_NAME ORDER BY ID";
            rs = st.executeQuery(sql3);
            System.out.println("Table records are(with duplicate counts): ");
            while(rs.next()) {
              String id = rs.getString("ID");
              String dog_name = rs.getString("DOG_NAME");
              String age = rs.getString("AGE");
              String owner_name = rs.getString("OWNER_NAME");
              String t_count = rs.getString("Count");
              System.out.println("Id: " + id + ", Dog_name: " + dog_name + ", Age: " + age + ", Owner_name: " + owner_name + ", T_count: " + t_count);
            }
    }catch(Exception e) {
      e.printStackTrace();
    }
  }
}

输出

获得的输出结果如下所示:

Table Pets created successfully...!
Records inserted successfully....!
Table records: 
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal
Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose
Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul
Table records are(with duplicate counts): 
Id: 1, Dog_name: Fluffy, Age: 1, Owner_name: Micheal, T_count: 2
Id: 3, Dog_name: Sheero, Age: 1, Owner_name: Rose, T_count: 1
Id: 4, Dog_name: Simba, Age: 2, Owner_name: Rahul, T_count: 1  
import mysql.connector
# Establishing the connection
connection = mysql.connector.connect(
    host='localhost',
    user='root',
    password='password',
    database='tut'
)
# Creating a cursor object
cursorObj = connection.cursor()
# Creating the table 'Pets'
create_table_query = '''
CREATE TABLE Pets (
ID int,
DOG_NAME varchar(30) not null,
AGE int not null,
OWNER_NAME varchar(30) not null
);
'''
cursorObj.execute(create_table_query)
print("Table 'Pets' is created successfully!")
# Inserting records into 'Pets' table
sql = "INSERT IGNORE INTO Pets (ID, DOG_NAME, AGE, OWNER_NAME) VALUES (%s, %s, %s, %s);"
values = [
    (1, 'Fluffy', 1, 'Micheal'),
    (1, 'Fluffy', 1, 'Micheal'),
    (2, 'Harry', 2, 'Jack'),
    (3, 'Sheero', 1, 'Rose'),
    (4, 'Simba', 2, 'Rahul'),
    (3, 'Sheero', 1, 'Rose'),
    (3, 'Sheero', 1, 'Rose')
]
cursorObj.executemany(sql, values)
print("Values inserted successfully")
# Display table
display_table = "SELECT * FROM Pets;"
cursorObj.execute(display_table)
# Printing the table 'Pets'
results = cursorObj.fetchall()
print("\nPets Table:")
for result in results:
    print(result)
# Return the count of duplicate records
duplicate_records_query = """
SELECT ID, DOG_NAME, OWNER_NAME, COUNT(*) AS Count FROM Pets
GROUP BY ID, DOG_NAME, OWNER_NAME
ORDER BY ID;
"""
cursorObj.execute(duplicate_records_query)
dup_rec = cursorObj.fetchall()
print("\nDuplicate records:")
for record in dup_rec:
    print(record)
# Closing the cursor and connection
cursorObj.close()
connection.close()

输出

获得的输出结果如下所示:

Table 'Pets' is created successfully!
Values inserted successfully

Pets Table:
(1, 'Fluffy', 1, 'Micheal')
(1, 'Fluffy', 1, 'Micheal')
(2, 'Harry', 2, 'Jack')
(3, 'Sheero', 1, 'Rose')
(4, 'Simba', 2, 'Rahul')
(3, 'Sheero', 1, 'Rose')
(3, 'Sheero', 1, 'Rose')

Duplicate records:
(1, 'Fluffy', 'Micheal', 2)
(2, 'Harry', 'Jack', 1)
(3, 'Sheero', 'Rose', 3)
(4, 'Simba', 'Rahul', 1)
广告