Get started with Python for Text Mining (NLP)

Apr 17, 2021 | Data Science

Get started with Python for Text Mining (NLP)



Want to learn how to use Python for Text Mining & Natural Language Processing (NLP)?
This repository has everything that you need to get started!

Author: Ties de Kok (Personal Page)

These materials accompany a PhD session on NLP for Accounting Research: slides

Quick link to the notebook: open notebook

## Table of contents

Introduction

The goal of this GitHub page is to provide you with everything you need to get started with Python and Natural Language Processing (NLP). The following topics are discussed:

(*Note: the neural network part is only a reference to the Stanford course CS224n*)

Who is this repository for?

The topics and techniques demonstrated in this repository are primarily oriented towards empirical research projects in fields such as Accounting, Finance, Marketing, Political Science, and other Social Sciences. However, many of the basics are also perfectly applicable if you are looking to use Python for any other type of Data Science!

How to use this repository?

This repository is written to facilitate learning by doing. All the material is written up in a Jupyter Notebook. See: NLP_notebook.ipynb. The topics are split up by task description. It is best to view the notebook locally or on nbviewer using this link: click here. An environment.yml file is provided that you can install using conda, this will automatically install all the packages used in the notebook. Instructions on how to install the environment are provided here: Install environment

Not yet familiar with the basic Python syntax?

Please check out my Getting started with Python for Research repository: click here

Using Jupyter

To run the provided notebook file you need to use Jupyter Lab or Jupyter Notebook. Jupyter comes pre-installed with the Anaconda distribution so you should have everything already installed and ready to go. The environment.yml will also install Jupyter Lab if you prefer to use that.

What is the Jupyter Notebook?

From the Jupyter website: The Jupyter Notebook is an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and explanatory text. In other words, the Jupyter Notebook allows you to program Python code straight from your browser!

How does the Jupyter Notebook work in the background?

Jupyter consists of several components, like a chef in a kitchen with various assistants. The heart of Jupyter is the *Jupyter Server*, which is the chef managing everything. Your browser acts as the kitchen where you order dishes (code and visualizations). The *kernel* plays the role of the kitchen assistant, executing your commands and preparing the dishes for you. While you typically run the server on your local computer, it can also be accessed from a specialized high-performance server in the cloud.

How to start a Jupyter Notebook?

The primary method to start a Jupyter Notebook is through the command line (terminal):

  1. Open your command prompt (on Windows, use the Anaconda Prompt).
  2. Activate the environment: conda activate PythonNLPTutorial
  3. Change to the desired starting directory using cd (e.g., cd C:\Files\Work\Project_1). Make sure to switch drives if needed.
  4. Start the Jupyter Notebook server with jupyter notebook or jupyter lab. This opens the interface in your default browser.
  5. You can also manually navigate to localhost:8888 in your browser if needed.

How to close a Jupyter NotebookLab server?

To close the Jupyter Server, open the command prompt window where the server is running, and press CTRL + C twice. Remember to save any open notebooks!

How to use the Jupyter Notebook?

Some shortcuts are worth noting:

  • Command mode (press esc) and edit mode (press enter)
  • Y: cell to code
  • Shift-Enter: run cell, select below
  • M: cell to markdown
  • Ctrl-Enter: run cell
  • A: insert cell above
  • B: insert cell below
  • X: cut selected cell

Code along!

Option 1: clone repository

You can download the contents of this repository by cloning it. Click the Clone or download button and then Download ZIP:

Extract the downloaded ZIP to a folder and start the Jupyter NotebookLab from there.

Install environment

You can install the environment by following these steps:

  1. Ensure you have Anaconda installed (link).
  2. Open your command prompt (Anaconda Prompt on Windows).
  3. Change to the folder where you extracted the ZIP file (e.g., cd C:\Files\Work\Project_1).
  4. Run the following command: conda env create -f environment.yml
  5. Activate the environment with: conda activate PythonNLPTutorial

A full list of all packages used is in the environment.yml file.

Option 2: use Binder

Some functionality might not work on Binder. Click here to access Binder:

Questions?

If you have questions or experience problems, please use the issues tab of this repository.

License

MIT – Ties de Kok – 2020

Special Thanks

Special thanks to telesarray-mixer for providing an awesome README template that I used as a reference.

Troubleshooting

Sometimes, you may encounter issues when trying to run your Jupyter Notebook. Here are a few troubleshooting suggestions:

  • If the Jupyter Notebook doesn’t launch, try restarting your Anaconda Prompt and re-running the commands.
  • Make sure you activated the correct environment with conda activate PythonNLPTutorial.
  • Check your Python version; it should be compatible with the installed packages.
  • If you’re having issues with package installations, refer to the Install environment section and ensure you followed all steps accurately.

For more insights, updates, or to collaborate on AI development projects, stay connected with fxis.ai.

At fxis.ai, we believe that such advancements are crucial for the future of AI, as they enable more comprehensive and effective solutions. Our team is continually exploring new methodologies to push the envelope in artificial intelligence, ensuring that our clients benefit from the latest technological innovations.

Stay Informed with the Newest F(x) Insights and Blogs

Tech News and Blog Highlights, Straight to Your Inbox