How to Use PyTorch Translate for Machine Translation

Aug 11, 2021 | Data Science

Welcome to an exploration of PyTorch Translate – a dynamic library designed for machine translation using the flexibility and power of PyTorch. However, it’s important to note that PyTorch Translate is now deprecated, and you are encouraged to use fairseq instead. In this blog, we’ll walk you through installing, configuring, and using this library with step-by-step instructions, ensuring a smooth transition.

Getting Started: Installation

If you’re keen on training and evaluating machine translation (MT) models without the exporting feature to Caffe2, follow these steps:

  1. Install PyTorch.
  2. Install Fairseq.
  3. Clone the Translate repository:
    • Command: git clone https://github.com/pytorch/translate.git pytorch-translate
    • Command: cd pytorch-translate
  4. Run: python setup.py install

Provided you have CUDA installed, you’re all set!

Requirements for Full Installation

To run Translate effectively, you will need:

  • A Linux operating system with a CUDA compatible card.
  • GNU C++ compiler version 4.9.2 or above.
  • A CUDA installation. We recommend CUDA 8.0 or 9.0.

Using Docker for Installation

If you prefer using Docker, here’s the way to go:

  1. Install Docker and nvidia-docker.
  2. Run these commands:
    • Command: sudo docker pull pytorch/translate
    • Command: sudo nvidia-docker run -i -t --rm pytorchtranslate /bin/bash

You should now be able to run the sample commands in the Usage Examples section below.

Building from Source

The following installation procedure has been tested on Ubuntu 16.04.5 LTS with a Tesla M60 card and CUDA 9 installation. If you face any issues, please report an issue.

  1. Create an Anaconda environment with Python 3.6, using Miniconda if necessary:
  2. Clone the Translate repo and run the setup script to build it.
  3. Install ONNX for exporting models.
  4. Build Translate using the provided commands.

Example Usage

Once installed, you can start training translation models using pre-existing example scripts. For instance, if you’re working on the IWSLT 2014 German-English translation task, you can train your model as follows:

bash pytorch_translate/examples/train_iwslt14.sh

This command utilizes a small dataset (~160K sentence pairs), and training should be completed in a few hours on a single GPU.

Pretrained Model Evaluation

You can also evaluate a pretrained model straightforwardly with:

bash pytorch_translate/examples/generate_iwslt14.sh

Exporting Your Model

To export a trained PyTorch model to Caffe2 using ONNX, use the following command:

bash pytorch_translate/examples/export_iwslt14.sh

Troubleshooting

If you encounter issues during installation or usage:

  • Verify that you have all the required dependencies installed correctly.
  • Revisit the installation commands to ensure they were executed without errors.
  • If you receive a “Protobuf compiler not found” error while installing ONNX, ensure the package is installed via:
  • conda install -c anaconda protobuf
  • Confirm that your CUDA and PyTorch versions are compatible.
  • For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

Conclusion

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Wrapping Up

While PyTorch Translate provides an excellent foundation for machine translation, transitioning to fairseq is the recommended path forward. Happy coding!

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox