Whatlang

Sep 17, 2023 | Educational

Natural language detection for Rust with focus on simplicity and performance.

Features

Supports 69 languages
100% written in Rust
Lightweight, fast, and simple
Recognizes not only a language but also a script (Latin, Cyrillic, etc.)
Provides reliability information

Get Started

If you want to dive into language detection with Whatlang, here’s a simple starting point:

rust
use whatlang::{detect, Lang, Script};

fn main() {
    let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";
    let info = detect(text).unwrap();
    
    assert_eq!(info.lang(), Lang::Epo);
    assert_eq!(info.script(), Script::Latin);
    assert_eq!(info.confidence(), 1.0);
    assert!(info.is_reliable());

For more details (e.g., how to blacklist some languages), please check the documentation.

Who Uses Whatlang?

Whatlang is trusted by major projects as a vital component for language recognition. You will be in good company with:

Sonic – a fast search backend in Rust.
Meilisearch – a blazing fast open-source search engine.

Feature Toggles

Whatlang also supports feature toggles for additional functionality:


Feature      Description
-----------------------------------------------------------------------
enum-map     Lang and Script implement Enum trait from enum-map
arbitrary    Support Arbitrary
serde        Implements Serialize and Deserialize for Lang and Script
dev          Enables whatlang::dev module for profiling purposes

How Does It Work?

The magic of language recognition unfolds through a method called trigram language models. This approach can be likened to a sophisticated detective using patterns to ascertain the identity of a mysterious character based on snippets of behavior.

In the same way, Whatlang dissects text into three-character combinations (trigrams) to identify the language. The detective, or the algorithm in this case, crafts a profile based on these clues, ultimately revealing the likely language.

How is_is_reliable Calculated?

The reliability of the detected language is calculated based on:

The number of unique trigrams in the text.
The difference between the top detected language and the next language.

Visualizing this can help imagine a two-dimensional space where languages are plotted, with sections dedicated to “Reliable” and “Not Reliable” based on thresholds like a map guiding the detective’s next move.

Running Benchmarks and Tests

To ensure everything works like a well-oiled machine, you can run some quick commands:

make bench – Run performance benchmarks.
make doc – Generate and open documentation.
make test – Run tests.
make watch – Watch changes and run tests.

Comparison with Alternatives

Whatlang competes with others using distinct methods:

Implementation	Languages	Algorithm
Whatlang	68	Trigrams
CLD2	83	Quadgrams
CLD3	107	Neural Network

Ports and Clones

If you’re eager to expand your horizons, consider these options:

whatlang-ffi – C bindings.
whatlanggo – Whatlang clone for the Go language.
whatlang-py – Bindings for Python.
whatlang-rb – Bindings for Ruby.
whatlangex – Bindings for Elixir.

Donations

You can support the project by donating NEAR tokens. Details can be found on the NEAR website.

License

Whatlang operates under the MIT License.

Contributors

greyblake – Creator, maintainer.
Dr-Emann – Optimization and improvements.
BaptisteGelez – Improvements.
Vishesh Chopra – Designed the logo.
Joel Natividad – Tagalog support.
ManyTheFish – Crazy optimization.
Kerollmops – Crazy optimization.

Troubleshooting

If you encounter challenges while using Whatlang, here are some suggestions:

Ensure that you are working with text that is suitable for language detection.
Check that you are using the latest version of Whatlang.
Refer to the official documentation for additional insights.
If you still face difficulties, consider reaching out to the community for support.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox