Python - 块和块隙

Chunking（块化）是根据单词的性质将相似的单词分组在一起的过程。在下面的示例中，我们定义了一个语法，根据该语法必须生成块。语法建议在创建块时要遵循的短语序列，例如名词和形容词等。块的图示输出如下所示。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), 
("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]
grammar = "NP: {?*}" 
cp = nltk.RegexpParser(grammar)
result = cp.parse(sentence) 
print(result)
result.draw()

运行上述程序后，我们将得到以下输出：

更改语法后，我们将得到如下所示的不同输出：

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"),
 ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = "NP: {?*}" 

chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence) 
print(result)
result.draw()

运行上述程序后，我们将得到以下输出：

Chinking（块隙）

Chinking（块隙）是从块中移除一系列词元的过程。如果一系列词元出现在块的中间，则会移除这些词元，留下它们原来存在的两个块。

import nltk

sentence = [("The", "DT"), ("small", "JJ"), ("red", "JJ"),("flower", "NN"), ("flew", "VBD"), ("through", "IN"),  ("the", "DT"), ("window", "NN")]

grammar = r"""
  NP:
    {<.*>+}         # Chunk everything
    }+{      # Chink sequences of JJ and NN
  """
chunkprofile = nltk.RegexpParser(grammar)
result = chunkprofile.parse(sentence) 
print(result)
result.draw()

运行上述程序后，我们将得到以下输出：

正如您所看到的，满足语法条件的部分作为单独的块从名词短语中被提取出来。这个提取不在所需块中的文本的过程称为chinking（块隙）。

打印页面