如何使用Python识别序列中出现频率最高的项目？

Python 服务器端编程编程

问题

你需要识别序列中出现频率最高的项目。

解决方案

我们可以使用计数器来跟踪序列中的项目。

什么是计数器？

“计数器”是一种映射，它为每个键保存一个整数计数。更新现有键会增加其计数。此对象用于计算可哈希对象的实例或作为多重集。

进行数据分析时，“计数器”是你最好的朋友之一。

此对象在Python中已经存在一段时间了，因此对于你们中的许多人来说，这将是一个快速的回顾。我们将从`collections`中导入`Counter`开始。

from collections import Counter

传统的字典，如果缺少键，将引发键错误。如果找不到键，Python的字典将返回键错误。

# An empty dictionary
dict = {}

# check for a key in an empty dict
dict['mystring']

# Error message
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-12-1e03564507c6> in <module>
3
4 # check for a key in an empty dict
----> 5 dict['mystring']
6
7 # Error message
KeyError: 'mystring'

在这种情况下，我们如何避免键错误异常？

计数器是字典的子类，具有非常类似字典的行为，但是，如果你查找缺少的键而不是引发键错误，它只会返回零。

# define the counter
c = Counter()

# check for the unavailable key
print(f"Output\n{c['mystring']}")

输出

c['mystring'] += 1
print(f"Output\n{c}")

输出

Counter({'mystring': 1})

示例

print(f"Output\n{type(c)}")

输出

<class 'collections.Counter'>

序列中出现频率最高的项目

计数器的另一个优点是你可以使用一个对象列表，它会为你计数。它使我们无需构建循环即可构建计数器。

Counter
('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split())

输出

Counter({'Peas': 1,
'porridge': 3,
'hot': 1,
'peas': 2,
'cold': 1,
'in': 1,
'the': 1,
'pot': 1,
'nine': 1,
'days': 1,
'old': 1})

`split`的作用是将字符串分割成一个单词列表。它以空格进行分割。

“计数器”将遍历该列表并计算所有单词，从而给出输出中显示的计数。

还有更多，我还可以计算短语中最常见的单词。

`most_common()`方法将提供出现频率最高的项目。

count = Counter('Peas porridge hot peas porridge cold peas porridge in the pot nine days old'.split())
print(f"Output\n{count.most_common(1)}")

输出

[('porridge', 3)]

示例

print(f"Output\n{count.most_common(2)}")

输出

[('porridge', 3), ('peas', 2)]

示例

print(f"Output\n{count.most_common(3)}")

输出

[('porridge', 3), ('peas', 2), ('Peas', 1)]

请注意，它返回了一个元组列表。元组的第一部分是单词，第二部分是其计数。

计数器实例的一个鲜为人知的功能是，可以使用各种数学运算轻松地组合它们。

string = 'Peas porridge hot peas porridge cold peas porridge in the pot nine days old'
another_string =
'Peas peas hot peas peas peas cold peas'

a = Counter(string.split())
b = Counter(another_string.split())

# Add counts
add = a + b
print(f"Output\n{add}")

输出

Counter({'peas': 7, 'porridge': 3, 'Peas': 2, 'hot': 2, 'cold': 2, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})

# Subtract counts
sub = a - b
print(f"Output\n{sub}")

输出

Counter({'porridge': 3, 'in': 1, 'the': 1, 'pot': 1, 'nine': 1, 'days': 1, 'old': 1})

最后，计数器在如何将数据存储在容器中的方面非常智能。

如上所示，它在存储时将单词分组在一起，允许我们将它们放在一起，这通常称为多重集。

我们可以使用`elements`一次提取一个单词。它不记住顺序，但将短语中的所有单词放在一起。

示例

print(f"Output\n{list(a.elements())}")

输出

['Peas', 'porridge', 'porridge', 'porridge', 'hot', 'peas', 'peas', 'cold', 'in', 'the', 'pot', 'nine', 'days', 'old']

示例

print(f"Output\n{list(a.values())}")

输出

[1, 3, 1, 2, 1, 1, 1, 1, 1, 1, 1]

示例

print(f"Output\n{list(a.items())}")

输出

[('Peas', 1), ('porridge', 3), ('hot', 1), ('peas', 2), ('cold', 1), ('in', 1), ('the', 1), ('pot', 1), ('nine', 1), ('days', 1), ('old', 1)]

Kiran P

更新于：2020年11月10日

235 次浏览

启动你的职业生涯

完成课程获得认证

开始