A Bi-LSTM neural network regression model to determine customers’ trust perception from Airbnb hosts’ self-description

Working with Text data

Since the text data are semi-structured, it’s treatment is different from numerical data. A text is a sequence of words or characters, which depends upon each other. The meaning of words in a text are also context based and a word can have many meaning depending upon the context. Thus a simple MLP will not be so effective in the case of text. How we understand a sentence? we separate the sentence in terms of parts of speech, we also established various relationship between words by determining subject, objects etc in a sentence.

There is a large lexical resources available in the corpora such as nltk (natural language tool box) originally developed by university of Pennsylvania have a trained models which are able to provide the structures of the sentence as discussed above. to obtain the part of speech called tag posting, we use various functions available in the nltk. Also we can obtain dependency parsing by all ready train models in the tool kit. A dependency parsing is hierarchy of words as shown in the following figure.

Dependency parsing of a sentence

Text data pre-processing

A text data contains many errors and noise, that are needed to be removed. some words may be present in the vocabulary, so it is necessary to replace these words with a tag that is understood by trained models present in nltk.

Since, any kind of neural network works on numerical data, therefore, the list of words in a sentence needs to be converted into the list of numerical values, which are nothing but the index of the words present in the dictionary of the corpora.

Bi-direction LSTM Neural Network brief introduction

The bi-direction neural network contains two sets of lstm layer, one in the forward direction and other in the backward direction. a schematic diagram of the Bi-directional neural network is shown in the following figure.

A bi-directional neural network

Since in the RNN, the problem of gradient vanishing become severe, therefore, to mitigate that problem a lstm RNN design was proposed. The bi-lstm model was trained and the code and result for entire thing is given in the following section.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store