Embedding as Service

Apr 13, 2022 | Educational

One-Stop Solution to encode sentences to fixed-length vectors from various embedding techniques.


GitHub stars Downloads Pypi package GitHub issues GitHub license Contributors

What is it?InstallationGetting StartedSupported EmbeddingsAPI

What is it?

Encoding Embedding

Encoding embedding is an upstream task of transforming inputs such as text, image, audio, and video into fixed-length vectors. These embeddings are essential in Natural Language Processing (NLP) and have seen numerous models emerge, like BERT, XLNet, and Word2Vec. The purpose of this project is to create a comprehensive solution for various embedding techniques, starting with popular text embeddings.

The embedding-as-service enables encoding any given text into fixed-length vectors using supported models and embedding methods.

Installation

^ Back to top

Using embedding-as-service as a module

Install the embedding-as-service via pip.

pip install embedding-as-service

Note that the code MUST be running on Python = 3.6. The module does not support Python 2!

Using embedding-as-service as a server

You also need to install a client module:

$ pip install embedding-as-service  # server
$ pip install embedding-as-service-client  # client

The client module does not require Python 3.6; it supports both Python 2 and Python 3.

Getting Started

^ Back to top

1. Initialize encoder using supported embeddings

If using embedding-as-service as a module:

from embedding_as_service.text.encode import Encoder
en = Encoder(embedding='bert', model='bert_base_cased', max_seq_length=256)

If using embedding-as-service as a server:

# start the server
$ embedding-as-service-start --embedding bert --model bert_base_cased --port 8080 --max_seq_length 256
from embedding_as_service_client import EmbeddingClient
en = EmbeddingClient(host='host_server_ip', port='host_port')

2. Get sentences’ tokens embedding

vecs = en.encode(texts=['hello aman', 'how are you?'])

The resulting shape would be (2, 128, 768) indicating batch x max_sequence_length x embedding_size.

3. Using pooling strategy

Here are the supported pooling methods:

Strategy Description
None No pooling at all
reduce_mean Average of all token embeddings
reduce_min Minimum of all token embeddings
reduce_max Maximum of all token embeddings
first_token Embedding of the first token of a sentence
last_token Embedding of the last token of a sentence

Example usage with reduce_mean:

vecs = en.encode(texts=['hello aman', 'how are you?'], pooling='reduce_mean')

4. Show embedding tokens

en.tokenize(texts=['hello aman', 'how are you?'])

5. Using your own tokenizer

texts = ['hello aman!', 'how are you?']
tokens = [s.split() for s in texts]
vecs = en.encode(tokens, is_tokenized=True)

API

^ Back to top

1. class embedding_as_service.text.encoder.Encoder

  • embedding: str (required)
  • model: str (required)
  • max_seq_length: int (default 128)

2. def embedding_as_service.text.encoder.Encoder.encode

  • Texts: List[str] or List[List[str]] (required)
  • pooling: str (optional)
  • is_tokenized: bool (default False)
  • batch_size: int (default 128)

3. def embedding_as_service.text.encoder.Encoder.tokenize

  • Texts: List[str] (required)

Supported Embeddings and Models

^ Back to top

Here are the supported embeddings and their respective models:

Embedding Model Embedding dimensions Paper
albert albert_base 768 Read Paper
xlnet xlnet_large_cased 1024 Read Paper
bert bert_base_uncased 768 Read Paper
eliot elmo_bi_lm 512 Read Paper
ulmfit ulmfit_forward 300 Read Paper
use use_dan 512 Read Paper
word2vec google_news_300 300 Read Paper
fasttext wiki_news_300 300 Read Paper
glove twitter_200 200 Read Paper

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Troubleshooting

If you encounter issues while using the embedding-as-service, consider the following troubleshooting ideas:

  • Ensure you are using Python 3.6 for the module.
  • Check if the server is running correctly and accessible from the client.
  • If you face version compatibility issues, try creating a virtual environment.
  • Make sure required packages are properly installed.
  • Validate the embeddings and models are available as per the documentation.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox