Python - 列表中去除重复字典

Python 是一种非常广泛使用的平台，用于 Web 开发、数据科学、机器学习以及执行不同的自动化流程。我们可以使用不同的数据类型（如列表、字典、数据集）在 Python 中存储数据。Python 字典中的数据和信息可以根据我们的选择进行编辑和更改。

以下文章将提供有关从列表中删除重复字典的不同方法的信息。没有直接选择重复字典的选项，因此我们必须使用 Python 的不同方法和功能来删除这些字典。

去除重复字典的各种方法

列表推导式

由于我们不能直接比较列表中的不同字典，我们必须将它们转换为其他形式，以便我们可以比较存在的不同字典。我们可以通过以下示例更好地理解这一点。

示例

def all_duplicate(whole_dict):  
    same = set()   #We check all the dictionaries with the help of same set created
    return [dict(tuple(sorted(dupl.items()))) for dupl in whole_dict if tuple(sorted(dupl.items())) not in same and not same.add(tuple(sorted(dupl.items())))]  #We will convert each dictionary into tuple so that the dictionary having the same value will be removed and the duplicate dictionary can be found easily, if the tuple has a different value then the dictionary will be kept. 

# Example 
Whole_Dictionary = [
    {"Place": "Haldwani", "State": 'Uttrakhand'},
    {"Place": "Hisar", "State": 'Haryana'},
    {"Place": "Shillong", "State": 'Meghalaya'},
    {"Place": "Kochi", "State": 'Kerala'},
    {"Place": "Bhopal", "State": 'Madhya Pradesh'},
    {"Place": "Kochi", "State": 'Kerala'},   #This Dictionary is repeating which is to be removed
    {"Place": "Haridwar", "State": 'Uttarakhand'}
]

Final_Dict = all_duplicate(Whole_Dictionary)
print(Final_Dict)   #The output after removing the duplicate dictionary will be shown

输出

上述示例的输出如下所示。

[{'Place': 'Haldwani', 'State': 'Uttrakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]

Pandas 库

此方法仅用于包含许多不同元素的大量数据集的情况，即仅用于包含复杂数据的字典。我们可以通过以下示例了解 Pandas 库的使用。

示例

import pandas as ps   #Do not forget to import pandas or error might occur
#Convert the dictionaries into panda frame

def all_duplicate(data):
    dd = ps.DataFrame(data)
    dd.drop_duplicates(inplace=True)   #Drop_duplicates() method will remove all the duplicate dictionaries
    return dd.to_dict(orient='records')  #Converting dictionaries back into list of dictionaries from panda frame

# Example 
Whole_Dictionary = [
    {"Place": "Haldwani", "State": 'Uttrakhand'},
    {"Place": "Hisar", "State": 'Haryana'},
    {"Place": "Shillong", "State": 'Meghalaya'},
    {"Place": "Kochi", "State": 'Kerala'},
    {"Place": "Bhopal", "State": 'Madhya Pradesh'},
    {"Place": "Kochi", "State": 'Kerala'},   #This Dictionary is repeating which is to be removed
    {"Place": "Haridwar", "State": 'Uttarakhand'}
]

Final_Dict = all_duplicate(Whole_Dictionary)
print(Final_Dict)   #The output after removing the duplicate dictionary will be shown

输出

[{'Place': 'Haldwani', 'State': 'Uttrakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]

冻结字典

使用冻结字典的概念是一种解决字典不可哈希性的技术。冻结字典可以作为另一个字典中的键或集合中的元素使用，因为它本质上是字典的不可变形式。frozendict 库提供了一种方便的冻结字典实现。我们可以通过以下示例更好地理解这一点。

示例

def make_hashable(d):
    return hash(frozenset(d.items())) # We will convert the dictionary key values into frozen set and then pass it to hash function

def all_duplicate(dicts):
    seen = set()  #It will check for similarities in the list
    return [d for d in dicts if not (make_hashable(d) in seen or seen.add(make_hashable(d)))] #If similarity will be found it will be removed and if not then the data will be kept

# Example 
Whole_Dictionary = [
    {"Place": "Haldwani", "State": 'Uttrakhand'},
    {"Place": "Hisar", "State": 'Haryana'},
    {"Place": "Shillong", "State": 'Meghalaya'},
    {"Place": "Kochi", "State": 'Kerala'},
    {"Place": "Bhopal", "State": 'Madhya Pradesh'},
    {"Place": "Kochi", "State": 'Kerala'},   #This Dictionary is repeating which is to be removed
    {"Place": "Haridwar", "State": 'Uttarakhand'}
]

Final_Dict = all_duplicate(Whole_Dictionary)
print(Final_Dict)   #The output after removing the duplicate dictionary will be shown

输出

[{'Place': 'Haldwani', 'State': 'Uttrakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}

辅助函数

这是从字典列表中删除重复字典的一种复杂方法。在此过程中，通过使用辅助函数，每个字典都被转换为其内容的排序元组。然后使用此辅助函数查找重复的元组并将其从字典列表中删除。我们可以通过以下示例更好地理解。

示例

def sorted_dict_to_tuple(d):  # sorted_dicts_to_tuple takes the dictionary as input and sorts it into tuple
    return tuple(sorted(d.items()))

def all_duplicates(dicts):  # The all_duplicates function will check all the elements in the dictionary and keep track of any repeating element
    seen = set() 
    return [d for d in dicts if not (sorted_dict_to_tuple(d) in seen or seen.add(sorted_dict_to_tuple(d)))]

# Example 
Whole_Dictionary = [
    {"Place": "Haldwani", "State": 'Uttrakhand'},
    {"Place": "Hisar", "State": 'Haryana'},
    {"Place": "Shillong", "State": 'Meghalaya'},
    {"Place": "Kochi", "State": 'Kerala'},
    {"Place": "Bhopal", "State": 'Madhya Pradesh'},
    {"Place": "Kochi", "State": 'Kerala'},   #This Dictionary is repeating which is to be removed
    {"Place": "Haridwar", "State": 'Uttarakhand'}
]

Final_Dict = all_duplicates(Whole_Dictionary)
print(Final_Dict)   #The output after removing the duplicate dictionary will be shown

输出

[{'Place': 'Haldwani', 'State': 'Uttrakhand'}, {'Place': 'Hisar', 'State': 'Haryana'}, {'Place': 'Shillong', 'State': 'Meghalaya'}, {'Place': 'Kochi', 'State': 'Kerala'}, {'Place': 'Bhopal', 'State': 'Madhya Pradesh'}, {'Place': 'Haridwar', 'State': 'Uttarakhand'}]

结论

遵循正确的步骤至关重要，因为从列表中删除重复字典是一项耗时且复杂的任务。本文列出了可以用来从列表中删除重复字典的每种方法。人们可以根据自己的方便和应用领域使用任何方法。

Aayush Shukla

更新于：2023年8月1日

774 次浏览

启动您的职业生涯

完成课程获得认证

开始学习