If you’ve ever needed to determine the language of a piece of text, you’re not alone! Fortunately, the Whichlang language detection library is here to help. This library is specifically designed for precision and performance, making it an essential tool if you’re dealing with multilingual data.
Why Build Whichlang?
While developing Quickwit, a search engine for logs and tracing data, the necessity for a lightweight, fast, and accurate language detection library became evident. Whichlang was born to meet these high throughput requirements while maintaining great precision.
Features of Whichlang
- No external dependencies
- Throughput exceeds 100 MB/s for both short and long strings
- Good accuracy rate of 99.5% depending on input size
- Supports languages including Arabic, Dutch, English, French, German, Hindi, Italian, Japanese, Korean, Mandarin, Portuguese, Russian, Spanish, Swedish, Turkish, and Vietnamese
How Does Whichlang Work?
Imagine you’re a skilled chef who needs to determine the flavor profile of a dish based on the ingredients. Each ingredient contributes to the overall taste just like how characters in text form the essence of a language. Whichlang uses a multiclass logistic regression model that resembles this culinary skill — analyzing 2, 3, and 4-grams of letters (the ingredients) from ASCII characters to forecast the language (the dish). By using the hashing trick, it maps these features elegantly into a reduced space of 4,096 dimensions.
Comparison with Whatlang
To understand how Whichlang stacks up against its predecessor, Whatlang, benchmarks were conducted on throughput and accuracy. The data shows that Whichlang outperforms Whatlang by tenfold in speed while being slightly more accurate.
Throughput Benchmarks
Processing Time (µs) Throughput (MiBs)
------------------------- --------------------
whatlangshort 16.62 1.66
whatlanglong 62.00 9.42
whichlangshort 0.26 105.69
whichlanglong 5.21 112.31
Accuracy Benchmarks
Crate: Whatlang
AVG: 91.69%
LANG AVG
------------------------------------------------------
Arabic 99.68%
Mandarin 96.09%
German 88.57%
English 85.99%
French 90.88%
... and many more
AVG 91.69%
Crate: Whichlang
AVG: 97.03%
LANG AVG
---------------------------------------------------------
Arabic 100.00%
Mandarin 98.65%
German 94.20%
English 97.15%
French 97.59%
... and more
AVG 97.03%
Troubleshooting
If you encounter issues while using Whichlang, here are some troubleshooting tips:
- If you’re experiencing low accuracy, ensure your input text is of sufficient length and quality.
- Check that you’re using the correct version of the library that matches your programming environment.
- Refer to the official documentation for further clarifications.
- For any unresolved problems, you can reach out to the community for support.
For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.
At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

