vec2word

Map embedding vector to word

collapse all in page

Syntax

words = vec2word(emb,M)

[words,dist] = vec2word(emb,M)

___ = vec2word(emb,M,k)

___ = vec2word(___,'Distance',distance)

Description

example

words = vec2word(emb,M) returns the closest words to the embedding vectors in the rows of M.

example

[words,dist] = vec2word(emb,M) returns the closest words to the embedding vectors in M, and returns the distances dist of each to their source vectors.

example

___ = vec2word(emb,M,k) returns the top k closest words.

example

___ = vec2word(___,'Distance',distance) specifies the distance metric.

Examples

collapse all

Map Words to Vectors and Back

Open Live Script

Load a pretrained word embedding using fastTextWordEmbedding. This function requires Text Analytics Toolbox™ Model for fastText English 16 Billion Token Word Embedding support package. If this support package is not installed, then the function provides a download link.

emb = fastTextWordEmbedding

emb = 
  wordEmbedding with properties:

     Dimension: 300
    Vocabulary: [1×1000000 string]

Map the words "Italy", "Rome", and "Paris" to vectors using word2vec.

italy = word2vec(emb,"Italy");
rome = word2vec(emb,"Rome");
paris = word2vec(emb,"Paris");

Map the vector italy - rome + paris to a word using vec2word.

word = vec2word(emb,italy - rome + paris)

word = 
"France"

Find Closest Words to Vector

Open Live Script

Find the top five closest words to a word embedding vector and their distances.

emb = fastTextWordEmbedding;

Map the words "Italy", "Rome", and "Paris" to vectors using word2vec.

italy = word2vec(emb,"Italy");
rome = word2vec(emb,"Rome");
paris = word2vec(emb,"Paris");

Map the vector italy - rome + paris to a word using vec2word. Find the top five closest words using the Euclidean distance metric.

k = 5;
M = italy - rome + paris;
[words,dist] = vec2word(emb,M,k,'Distance','euclidean');

Plot the words and distances in a bar chart.

figure;
bar(dist)
xticklabels(words)
xlabel("Word")
ylabel("Distance")
title("Distances to Vector")

Input Arguments

collapse all

`emb` — Input word embedding
`wordEmbedding` object

Input word embedding, specified as a wordEmbedding object.

`M` — Word embedding vectors
matrix

Word embedding vectors, specified as a matrix. Each row of M is a word embedding vector. M must have emb.Dimension columns.

`k` — Number of closest words
positive integer

Number of closest words to return, specified as a positive integer.

`distance` — Distance metric
`'cosine'` (default) | `'euclidean'`

Distance metric, specified as 'cosine' or 'euclidean'.

Output Arguments

collapse all

`words` — Output words
string vector

Output words, returned as a string vector.

`dist` — Distance of words to source vectors
vector

Distance of words to their source vectors, returned as a vector.

Documentation

vec2word

Syntax

Description

Examples

Map Words to Vectors and Back

Find Closest Words to Vector

Input Arguments

`emb` — Input word embedding
`wordEmbedding` object

`M` — Word embedding vectors
matrix

`k` — Number of closest words
positive integer

`distance` — Distance metric
`'cosine'` (default) | `'euclidean'`

Output Arguments

`words` — Output words
string vector

`dist` — Distance of words to source vectors
vector

See Also

Topics

Text Analytics Toolbox Documentation

Support

Documentation

vec2word

Syntax

Description

Examples

Map Words to Vectors and Back

Find Closest Words to Vector

Input Arguments

emb — Input word embedding wordEmbedding object

M — Word embedding vectors matrix

k — Number of closest words positive integer

distance — Distance metric 'cosine' (default) | 'euclidean'

Output Arguments

words — Output words string vector

dist — Distance of words to source vectors vector

See Also

Topics

Text Analytics Toolbox Documentation

Support

`emb` — Input word embedding
`wordEmbedding` object

`M` — Word embedding vectors
matrix

`k` — Number of closest words
positive integer

`distance` — Distance metric
`'cosine'` (default) | `'euclidean'`

`words` — Output words
string vector

`dist` — Distance of words to source vectors
vector