Python——使用Word2Vec进行词嵌入
词嵌入是一种语言模型技术,用于将单词映射到实数向量。它使用多个维度在向量空间中表示单词或短语。可以使用神经网络、共现矩阵、概率模型等各种方法生成词嵌入。
Word2Vec 由用于生成单词嵌入的模型组成。这些模型是浅层两层神经网络,具有一个输入层、一个隐藏层和一个输出层。
示例
# importing all necessary modules from nltk.tokenize import sent_tokenize, word_tokenize import warnings warnings.filterwarnings(action = 'ignore') import gensim from gensim.models import Word2Vec # Reads ‘alice.txt’ file sample = open("C:\Users\Vishesh\Desktop\alice.txt", "r") s = sample.read() # Replaces escape character with space f = s.replace("\n", " ") data = [] # iterate through each sentence in the file for i in sent_tokenize(f): temp = [] # tokenize the sentence into words for j in word_tokenize(i): temp.append(j.lower()) data.append(temp) # Create CBOW model model1 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window = 5) # Print results print("Cosine similarity between 'alice' " + "and 'wonderland' - CBOW : ", model1.similarity('alice', 'wonderland')) print("Cosine similarity between 'alice' " + "and 'machines' - CBOW : ", model1.similarity('alice', 'machines')) # Create Skip Gram model model2 = gensim.models.Word2Vec(data, min_count = 1, size = 100, window =5, sg = 1) # Print results print("Cosine similarity between 'alice' " + "and 'wonderland' - Skip Gram : ", model2.similarity('alice', 'wonderland')) print("Cosine similarity between 'alice' " + "and 'machines' - Skip Gram : ", model2.similarity('alice', 'machines'))
广告