Wit.ai 101: Basics of Natural Language Processing through Wit.ai
This tutorial is learning basic concepts of Natural Language Processing and its implementation through Wit.ai.
Natural Language Processing (NLP) is the science of extracting the intention of text and relevant information from text. One of the common applications of NLP is Conversational AI or Chatbots. Alexa, Siri and other businesses like Sephora uses bots and virtual assistants to aid our daily tasks/reminders/business processes using human language (natural language).
In this tutorial, we will discuss the essential concepts in NLP, applying these concepts by creating an NLP application using Wit.ai, and training and testing our intelligent application.
The three important concepts in NLP for bots are Utterance, Intents, and Entities:
Utterance is input from the user that we need to interpret to deliver the correct response. Basically, anything the user says:
- What day is it?
- Thank you!
- I'd like to order sandwich.
Whether a question, an expression, or a command, the whole statement is called an utterance.
An intent represents a task or action the user wants to perform. It is the purpose or goal expressed in user's utterance.
Intent | Utterance |
---|---|
Check_Weather | How's the weather today? What's the weather forecast tomorrow? How's the weather like in New York? |
Greetings | Good morning! Hello! |
Travel_Booking | Book me a flight to the Philippines this Friday. I'd like to travel in Singapore this December 25. I need a plane ticket for my trip next week to Bangkok, Thailand. |
The table above shows user's intentions based on sample utterances like asking "How's the weather today?" means that the user want's to check the weather and we label this intent into Check_Weather.
💡 Trivia
Some practitioners classify intents into two type:
(1) Casual Intent which refers to small talks, usually to start and end conversation ("Hi!", "hello", "bye!"), and affirmative or negative intentions ("Yes please.", "No, thank you!", "Okay cool").
(2) Business Intent are the intents that directly map to business processes. If we have a booking flight bot, utterances like "Book me a flight to the Philippines this Friday" will have to search for available flights in the Philippines specifically this Friday, the result might give a list of time available and a whole conversation thread is needed to finish the booking flight business process.
An entity represents a unit of data you want extracted from the utterance, such as names, dates, product names, or any significant group of words. An utterance can include many entities or none at all.
An intent is the intention of the whole utterance while entities are pieces of data extracted from the utterance. Intents are tied to actions and entities are information needed to perform this action.
🖥️ In programming perspective, intent would be a trigger to perform an operation/method while entities would be the parameters passed for this method/operation call.
Utterance | Intent | Entities |
---|---|---|
How's the weather today? What's the weather forecast tomorrow? How's the weather like in New York? |
Check_Weather | {date: today} {date: tomorrow} {location: New York} |
Book me a flight to the Philippines this Friday. I'd like to travel in Singapore this December 25. I need a plane ticket for my trip next week to Bangkok, Thailand. |
Travel_Booking | {location: Philippines, time: Friday} {location: Singapore, date: December 25} {date: next week, location: Bangkok, Thailand} |
Take this utterance for example:
Book me a flight to the Philippines this Friday.
Clearly, the intent would be flight booking, or as we labelled Travel_Booking, but the utterance also gave us useful data: 'Philippines' a location data where its actually the destination of the travel, and 'this Friday' a date data as the day of flight requested. These two data are called entities and would help us decide to give a more accurate response to the user.
💡 Trivia
Intents are required, but entities are optional. Casual intents doesn't need entities as they are just typically small talk and no further data required to understand the whole utterance. An utterance may contain two or more occurrences of an entity with the same data type, but the meaning of each data is based on context within the utterance.
I want to travel from Philippines to Singapore.
In the example utterance above, you have two location data, 'Philippines' and 'Singapore' and you need to specify each entity by creating sub entities: origin and destination location
Composite Entities are entities within entities.
I saw 2 black Mercedes.
The 2 black Mercedes is an entity but can also be divided to three entities: {number: 2}, {color: black}, {car: Mercedes}
. Composite entities are optional and depends how you train your bot to treat this utterance.
To follow this tutorial, you just need:
- An Active Facebook Account
- Prepare Terminal/Command Prompt
- Open Wit.ai.
- Click Continue with Facebook
- Click New App
- Name your App (demoApp) then click Create
Now, you are in the understanding tab of Wit.ai.
We will now add sample utterances to train our app.
- In the utterance bar, type the sample utterance for our app.
Book me a flight to the Philippines this Friday
- Highlight entities: this Friday and Philippines then search for built-in entity of wit.ai
Example: After highlighting Philippines search the word 'location' in entity, then click it once found to assign location entity for Philippines
- After you assigned entities, you should see something like this:
Wrap up of step 2 and 3:
- To add intent, after typing the utterance, click Choose or Add Intent
- Create new intent (Travel_Book) then click Create Intent
- Click Train and Validate to train the app with the sample utterance
I repeated steps 2 - 4 to create my demoApp using the sample table of Intents vs. Entities
- Under Management tab and click Settings
- Under HTTP API enter a test utterance; 'good night'
- Copy the curl request generated. The paste it in your terminal
- Hit enter and a response will be given briefly
❗ NOTE:
For windows user, copy-paste directly the curl command to your command prompt might result to multiple command so what you can do first is paste it in notepad, delete new lines (enter) and backslahes '\', so the whole command should be just one line. Also delete the space between colon (:) and Bearer, then replace single quotes (') with double quotes(") and your command should look like this:
curl -H "Authorization:Bearer ERPO3QGEHBB2QKEMRVPHJWYP22PR36GV" "https://api.wit.ai/message?v=20201013&q=good%20night"
- The response is in JSON format. To make it more understandable, highlight the response, copy then go to JSON Pretty Print Online paste it on the left editor, then click Make Pretty and see the result in the right side:
The sample response from 'good night' test utterance
{
"text": "good night",
"intents": [
{
"id": "2396879380619776",
"name": "Greeting",
"confidence": 0.701
}
],
"entities": {},
"traits": {}
}
The response essentially contains:
- text - the user utterance
- intents - which contains name or the intent detected, and confidence as the probability that the intent classification is confidently correct.
- entities - which contains the entities extracted in the utterance (in this case, no entity is detected)
By this example, we can conclude that our app is 70% confident that 'good night' is a Greeting intent, then we craft possible response for greetings like this to the users.
Doing the same step in 5 and 6, I tried 'Fly me to South Korea on the 30th'
We will have a response that looks like this:
{
"text": "Fly me to South Korea on the 30th",
"intents": [
{
"id": "2617649178488304",
"name": "Travel_Book",
"confidence": 0.547
}
],
"entities": {
"wit$datetime:datetime": [
{
"id": "2709435475941805",
"name": "wit$datetime",
"role": "datetime",
"start": 22,
"end": 33,
"body": "on the 30th",
"confidence": 0.9622,
"entities": [],
"type": "value",
"grain": "day",
"value": "2020-10-30T00:00:00.000-07:00",
"values": [
{
"type": "value",
"grain": "day",
"value": "2020-10-30T00:00:00.000-07:00"
},
{
"type": "value",
"grain": "day",
"value": "2020-11-30T00:00:00.000-08:00"
},
{
"type": "value",
"grain": "day",
"value": "2020-12-30T00:00:00.000-08:00"
}
]
}
],
"wit$location:location": [
{
"id": "3389580727801815",
"name": "wit$location",
"role": "location",
"start": 10,
"end": 21,
"body": "South Korea",
"confidence": 0.8921,
"entities": [],
"resolved": {
"values": [
{
"name": "South Korea",
"domain": "country",
"coords": {
"lat": 36.5,
"long": 127.75
},
"timezone": "Asia/Seoul",
"external": {
"geonames": "1835841",
"wikidata": "Q884",
"wikipedia": "South Korea"
},
"attributes": {}
}
]
},
"type": "resolved"
}
]
},
"traits": {}
}
We can see that:
- text - Fly me to New York on the 30th
- intents
- name - Travel_Book
- confidence - 0.547
- entities
- datetime - body: on the 30th
- location - body: South Korea
From this example, we see that its only 54% confident that the utterance 'Fly me to South Korea on the 30th' has a Travel_Book intention. This is understandable, since we have limited utterance sample for our app. We have to train it with more sample data for better performance.
❗ NOTE
In our latest example, we see that entities also contains other data like role, coordinates, timezone etc.
It all depends on how you will use all other data from the response body, in our demo, we are only interested on what entities are being captured, and the intent that is classified by the test utterance.
You can try our sample app in this tutorial by clicking this link and entering your test utterance in the utterance field under understanding tab.
💻 For advance programmers, you can use this curl request curl -H 'Authorization: Bearer ERPO3QGEHBB2QKEMRVPHJWYP22PR36GV' 'https://api.wit.ai/message?v=20201025&q=' and append your sample utterance. Remember to replace spaces with "%20" characters.
Now that we have an intelligent NLP system, thanks to Wit.ai! We can now connect our chatbots or web apps to process queries by different users and craft appropriate responses to them based on what intent and entities are being extracted/classified by our system.
You can never train it enough. Languages and its usage changes from time to time, it would be better to check accuracy of your NLP system and train with new utterances you captured from users to improve it!
This is why natural language processing is exciting! You get to be updated with current cool languages of people and make your bot keep up with the current trend 😎
In the future, I'll create tutorial in setting up basic chatbot for beginners, and how to connect wit.ai.
In the meantime, you can check this link for creating a messenger bot tutorial