Natural language detection for Rust with focus on simplicity and performance.
Try online demo | |
|
Features
- Supports 69 languages
- 100% written in Rust
- Lightweight, fast, and simple
- Recognizes not only a language but also a script (Latin, Cyrillic, etc.)
- Provides reliability information
Get Started
If you want to dive into language detection with Whatlang, here’s a simple starting point:
rust
use whatlang::{detect, Lang, Script};
fn main() {
let text = "Ĉu vi ne volas eklerni Esperanton? Bonvolu! Estas unu de la plej bonaj aferoj!";
let info = detect(text).unwrap();
assert_eq!(info.lang(), Lang::Epo);
assert_eq!(info.script(), Script::Latin);
assert_eq!(info.confidence(), 1.0);
assert!(info.is_reliable());
For more details (e.g., how to blacklist some languages), please check the documentation.
Who Uses Whatlang?
Whatlang is trusted by major projects as a vital component for language recognition. You will be in good company with:
- Sonic – a fast search backend in Rust.
- Meilisearch – a blazing fast open-source search engine.
Feature Toggles
Whatlang also supports feature toggles for additional functionality:
Feature Description
-----------------------------------------------------------------------
enum-map Lang and Script implement Enum trait from enum-map
arbitrary Support Arbitrary
serde Implements Serialize and Deserialize for Lang and Script
dev Enables whatlang::dev module for profiling purposes
How Does It Work?
The magic of language recognition unfolds through a method called trigram language models. This approach can be likened to a sophisticated detective using patterns to ascertain the identity of a mysterious character based on snippets of behavior.
In the same way, Whatlang dissects text into three-character combinations (trigrams) to identify the language. The detective, or the algorithm in this case, crafts a profile based on these clues, ultimately revealing the likely language.
How is_is_reliable Calculated?
The reliability of the detected language is calculated based on:
- The number of unique trigrams in the text.
- The difference between the top detected language and the next language.
Visualizing this can help imagine a two-dimensional space where languages are plotted, with sections dedicated to “Reliable” and “Not Reliable” based on thresholds like a map guiding the detective’s next move.
Running Benchmarks and Tests
To ensure everything works like a well-oiled machine, you can run some quick commands:
- make bench – Run performance benchmarks.
- make doc – Generate and open documentation.
- make test – Run tests.
- make watch – Watch changes and run tests.
Comparison with Alternatives
Whatlang competes with others using distinct methods:
| Implementation | Languages | Algorithm |
|---|---|---|
| Whatlang | 68 | Trigrams |
| CLD2 | 83 | Quadgrams |
| CLD3 | 107 | Neural Network |
Ports and Clones
If you’re eager to expand your horizons, consider these options:
- whatlang-ffi – C bindings.
- whatlanggo – Whatlang clone for the Go language.
- whatlang-py – Bindings for Python.
- whatlang-rb – Bindings for Ruby.
- whatlangex – Bindings for Elixir.
Donations
You can support the project by donating NEAR tokens. Details can be found on the NEAR website.
License
Whatlang operates under the MIT License.
Contributors
- greyblake – Creator, maintainer.
- Dr-Emann – Optimization and improvements.
- BaptisteGelez – Improvements.
- Vishesh Chopra – Designed the logo.
- Joel Natividad – Tagalog support.
- ManyTheFish – Crazy optimization.
- Kerollmops – Crazy optimization.
Troubleshooting
If you encounter challenges while using Whatlang, here are some suggestions:
- Ensure that you are working with text that is suitable for language detection.
- Check that you are using the latest version of Whatlang.
- Refer to the official documentation for additional insights.
- If you still face difficulties, consider reaching out to the community for support.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

