Lucene - StandardAnalyzer

这是最复杂的分析器，能够处理姓名、电子邮件地址等。它将每个标记转换为小写，并删除常见的单词和标点符号（如果存在）。

类声明

以下是org.apache.lucene.analysis.StandardAnalyzer类的声明：

public final class StandardAnalyzer
   extends StopwordAnalyzerBase

字段

以下是org.apache.lucene.analysis.StandardAnalyzer类的字段：

static int DEFAULT_MAX_TOKEN_LENGTH – 这是默认允许的最大标记长度。
static Set<?> STOP_WORDS_SET - 一个不可修改的集合，包含一些常见的英语单词，这些单词通常对搜索没有用。

类构造函数

下表显示了不同的类构造函数：

序号	构造函数和说明
1	StandardAnalyzer(Version matchVersion) 使用默认停用词 (STOP_WORDS_SET) 构建分析器。
2	StandardAnalyzer(Version matchVersion, File stopwords) 已弃用。请改用 StandardAnalyzer(Version, Reader) 。
3	StandardAnalyzer(Version matchVersion, Reader stopwords) 使用给定读取器中的停用词构建分析器。
4	StandardAnalyzer(Version matchVersion, Set<?> stopWords) 使用给定的停用词构建分析器。

类方法

下表显示了不同的类方法：

序号	方法和说明
1	protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader) 为该分析器创建一个新的 ReusableAnalyzerBase.TokenStreamComponents 实例。
2	int getMaxTokenLength()
3	void setMaxTokenLength(int length) 设置允许的最大标记长度。

序号

方法和说明

protected Reusable Analyzer Base. Token Stream Components create Components(String fieldName, Reader reader)

为该分析器创建一个新的 ReusableAnalyzerBase.TokenStreamComponents 实例。

int getMaxTokenLength()

void setMaxTokenLength(int length)

设置允许的最大标记长度。

继承的方法

此类继承自以下类的方法：

org.apache.lucene.analysis.StopwordAnalyzerBase
org.apache.lucene.analysis.ReusableAnalyzerBase
org.apache.lucene.analysis.Analyzer
java.lang.Object

用法

private void displayTokenUsingStandardAnalyzer() throws IOException {
   String text 
      = "Lucene is simple yet powerful java based search library.";
   Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
   TokenStream tokenStream 
      = analyzer.tokenStream(LuceneConstants.CONTENTS,
        new StringReader(text));
   TermAttribute term = tokenStream.addAttribute(TermAttribute.class);
   
   while(tokenStream.incrementToken()) {
      System.out.print("[" + term.term() + "] ");
   }
}

示例应用程序

让我们创建一个测试 Lucene 应用程序来测试使用 BooleanQuery 的搜索。

步骤	说明
1	创建一个名为LuceneFirstApplication的项目，放在com.tutorialspoint.lucene包下，如Lucene - 第一个应用程序章节所述。您也可以使用Lucene - 第一个应用程序章节中创建的项目，以便理解搜索过程。
2	创建LuceneConstants.java，如Lucene - 第一个应用程序章节所述。保持其余文件不变。
3	创建如下所示的LuceneTester.java。
4	清理并构建应用程序，以确保业务逻辑按要求工作。

LuceneConstants.java

此类用于提供可在示例应用程序中使用的各种常量。

package com.tutorialspoint.lucene;

public class LuceneConstants {
   public static final String CONTENTS = "contents";
   public static final String FILE_NAME = "filename";
   public static final String FILE_PATH = "filepath";
   public static final int MAX_SEARCH = 10;
}

LuceneTester.java

此类用于测试 Lucene 库的搜索功能。

package com.tutorialspoint.lucene;

import java.io.IOException;
import java.io.StringReader;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.StandardAnalyzer;
import org.apache.lucene.analysis.tokenattributes.TermAttribute;
import org.apache.lucene.util.Version;

public class LuceneTester {
	
   public static void main(String[] args) {
      LuceneTester tester;

      tester = new LuceneTester();
   
      try {
         tester.displayTokenUsingStandardAnalyzer();
      } catch (IOException e) {
         e.printStackTrace();
      }
   }

   private void displayTokenUsingStandardAnalyzer() throws IOException {
      String text 
         = "Lucene is simple yet powerful java based search library.";
      Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_36);
      TokenStream tokenStream = analyzer.tokenStream(
         LuceneConstants.CONTENTS, new StringReader(text));
      TermAttribute term = tokenStream.addAttribute(TermAttribute.class);
      while(tokenStream.incrementToken()) {
         System.out.print("[" + term.term() + "] ");
      }
   }
}

运行程序

完成源代码创建后，您可以通过编译和运行程序继续操作。为此，请保持LuceneTester.Java文件选项卡处于活动状态，并使用 Eclipse IDE 中提供的“运行”选项，或使用Ctrl + F11编译和运行LuceneTester应用程序。如果您的应用程序成功运行，它将在 Eclipse IDE 的控制台中打印以下消息：

[lucene] [simple] [yet] [powerful] [java] [based] [search] [library]

lucene_analysis.htm

打印页面