Embedding as Service

Apr 13, 2022 | Educational

One-Stop Solution to encode sentences to fixed-length vectors from various embedding techniques.

Inspired from bert-as-service

What is it? • Installation • Getting Started • Supported Embeddings • API

What is it?

Encoding Embedding

Encoding embedding is an upstream task of transforming inputs such as text, image, audio, and video into fixed-length vectors. These embeddings are essential in Natural Language Processing (NLP) and have seen numerous models emerge, like BERT, XLNet, and Word2Vec. The purpose of this project is to create a comprehensive solution for various embedding techniques, starting with popular text embeddings.

The embedding-as-service enables encoding any given text into fixed-length vectors using supported models and embedding methods.

Installation

^ Back to top

Using embedding-as-service as a module

Install the embedding-as-service via pip.

pip install embedding-as-service

Note that the code MUST be running on Python = 3.6. The module does not support Python 2!

Using embedding-as-service as a server

You also need to install a client module:

$ pip install embedding-as-service  # server
$ pip install embedding-as-service-client  # client

The client module does not require Python 3.6; it supports both Python 2 and Python 3.

Getting Started

^ Back to top

1. Initialize encoder using supported embeddings

If using embedding-as-service as a module:

from embedding_as_service.text.encode import Encoder
en = Encoder(embedding='bert', model='bert_base_cased', max_seq_length=256)

If using embedding-as-service as a server:

# start the server
$ embedding-as-service-start --embedding bert --model bert_base_cased --port 8080 --max_seq_length 256

from embedding_as_service_client import EmbeddingClient
en = EmbeddingClient(host='host_server_ip', port='host_port')

2. Get sentences’ tokens embedding

vecs = en.encode(texts=['hello aman', 'how are you?'])

The resulting shape would be (2, 128, 768) indicating batch x max_sequence_length x embedding_size.

3. Using pooling strategy

Here are the supported pooling methods:

Strategy	Description
None	No pooling at all
reduce_mean	Average of all token embeddings
reduce_min	Minimum of all token embeddings
reduce_max	Maximum of all token embeddings
first_token	Embedding of the first token of a sentence
last_token	Embedding of the last token of a sentence

Example usage with reduce_mean:

vecs = en.encode(texts=['hello aman', 'how are you?'], pooling='reduce_mean')

4. Show embedding tokens

en.tokenize(texts=['hello aman', 'how are you?'])

5. Using your own tokenizer

texts = ['hello aman!', 'how are you?']
tokens = [s.split() for s in texts]
vecs = en.encode(tokens, is_tokenized=True)

API

^ Back to top

1. class embedding_as_service.text.encoder.Encoder

embedding: str (required)
model: str (required)
max_seq_length: int (default 128)

2. def embedding_as_service.text.encoder.Encoder.encode

Texts: List[str] or List[List[str]] (required)
pooling: str (optional)
is_tokenized: bool (default False)
batch_size: int (default 128)

3. def embedding_as_service.text.encoder.Encoder.tokenize

Texts: List[str] (required)

Supported Embeddings and Models

^ Back to top

Here are the supported embeddings and their respective models:

Embedding	Model	Embedding dimensions	Paper
albert	albert_base	768	Read Paper
xlnet	xlnet_large_cased	1024	Read Paper
bert	bert_base_uncased	768	Read Paper
eliot	elmo_bi_lm	512	Read Paper
ulmfit	ulmfit_forward	300	Read Paper
use	use_dan	512	Read Paper
word2vec	google_news_300	300	Read Paper
fasttext	wiki_news_300	300	Read Paper
glove	twitter_200	200	Read Paper

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Troubleshooting

If you encounter issues while using the embedding-as-service, consider the following troubleshooting ideas:

Ensure you are using Python 3.6 for the module.
Check if the server is running correctly and accessible from the client.
If you face version compatibility issues, try creating a virtual environment.
Make sure required packages are properly installed.
Validate the embeddings and models are available as per the documentation.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox

Embedding as Service

What is it?

Encoding Embedding

Installation

Using embedding-as-service as a module

Using embedding-as-service as a server

Getting Started

1. Initialize encoder using supported embeddings

2. Get sentences’ tokens embedding

3. Using pooling strategy

4. Show embedding tokens

5. Using your own tokenizer

API

Supported Embeddings and Models

Troubleshooting

Let’s Build Success Together