如何在 Python 函数中消除重复行？

Python 服务器端编程编程

在本文中，我们将讨论如何在 Python 中删除重复的多行。如果文件很小并且只有几行，则可以手动执行删除重复行的过程。但是，在处理大型文件时，Python 可以提供帮助。

使用文件处理方法

Python 具有用于创建、打开和关闭文件的内置方法，这使得处理文件变得更容易。Python 还允许在文件打开时执行多种文件操作，例如读取、写入和追加数据。

要从 Python 文本文件或函数中删除重复行，我们使用 Python 中的文件处理方法。文本文件或函数必须与包含 Python 程序的 .py 文件位于同一目录中。

算法

以下是消除 Python 函数中重复行的步骤

由于我们只需要读取此文件的内容，因此首先以只读模式打开输入文件。
现在，要将内容写入此文件，请以写入模式打开输出文件。
逐行读取输入文件，然后检查输出文件以查看是否已写入任何与该行类似的行。
如果没有，则将此行添加到输出文件并在集合中保存该行的哈希值。我们不会检查和存储整行，而是检查每行的哈希值。在处理大型文件时，这更有效并且占用更少的空间。
如果哈希值已添加到集合中，则跳过该行。
完成后，输出文件将包含输入文件中的每一行，而没有任何重复。

在这里，输入文件即“File.txt”包含以下数据：

Welcome to TutorialsPoint.
Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
eliminate repeated lines.
eliminate repeated lines.
Skip the line.

示例

以下是如何消除 Python 函数中重复行的示例：

import hashlib
# path of the input and output files
OutFile = 'C:\Users\Lenovo\Downloads\Work TP\pre.txt'
InFile = r'C:\Users\Lenovo\Downloads\Work TP\File.txt'
# holding the line which is already seen
lines_present = set()
# opening the output file in write mode to write in it
The_Output_File = open(OutFile, "w")

# loop for opening the file in read mode
for l in open(InFile, "r"):
   # finding the hash value of the current line
      # Before performing the hash, we remove any blank spaces and new lines from the end of the line.
      # Using hashlib library determine the hash value of a line.
      hash_value = hashlib.md5(l.rstrip().encode('utf-8')).hexdigest()
      if hash_value not in lines_present:
         The_Output_File.write(l)
         lines_present.add(hash_value)
# closing the output text file
The_Output_File.close()

输出

我们可以在以下输出中看到，输入文件中的所有重复行都已从输出文件中删除，输出文件包含如下所示的唯一数据：

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

示例

以下是另一个消除 Python 函数中重复行的示例：

# path of the input and output files
# Create the output file in write mode
OutFile = open('C:\Users\Lenovo\Downloads\Work TP\pre.txt',"w")
11
# Create an input file in read mode
InFile = open('C:\Users\Lenovo\Downloads\Work TP\File.txt', "r")
# holding the line which is already seen
lines_present = set()
# iterate every line present in the file
for l in InFile:
   # check whether the lines are unique
   if l not in lines_present:
      # writing all the unique lines in the output file
      OutFile.write(l)
      # adding unique lines in the lines_present
      lines_present.add(l)
# closing the output text files
OutFile.close()
InFile.close()

输出

我们可以在以下输出中看到，输入文件中的所有重复行都已从输出文件中删除，输出文件包含如下所示的唯一数据

Welcome to TutorialsPoint.
Python programming language in this file.
eliminate repeated lines.
Skip the line.

Sarika Singh

更新于： 2023-09-09

5K+ 浏览量

开启你的职业生涯

通过完成课程获得认证

开始学习

如何在 Python 函数中消除重复行？

使用文件处理方法

算法

示例

输出

示例

输出

开启你的 职业生涯

开启你的职业生涯