A Bi-LSTM neural network regression model to determine customers’ trust perception from Airbnb hosts’ self-description
Airbnb is a platform on which the guests (customers) searches for the accommodation/hotel room. while doing such online exchange, it is important for the customers to build trust on the host based on all the information available to the customers. Customers’ trust perception rely on many things such as hosts’ self-description, hosts’ profile picture and other description available to the customers. In this article, a bi-directional lstm neural network regression model is built to determine customers’ trust perception from the hosts’ self-description. The data set as available on the airbnb website is collected on contractual basis by recruiting people on Amazon Turk. The the hosts’ self-description was collected and annotated accordingly on a scale of 1 to 6. To obtain customers’ self-perception from the hosts’ self-description, which is in a text format, A recurrent neural network (RNN) performs better than a simple multi-layer perceptron. Bi-direction LSTM is a type of RNN, ability to have memory as well as ability to incorporate information from both the direction forward or back work, as in a sentence, the role of a word is determined from the words which are ahead and the words which are behind the word.
Working with Text data
Since the text data are semi-structured, it’s treatment is different from numerical data. A text is a sequence of words or characters, which depends upon each other. The meaning of words in a text are also context based and a word can have many meaning depending upon the context. Thus a simple MLP will not be so effective in the case of text. How we understand a sentence? we separate the sentence in terms of parts of speech, we also established various relationship between words by determining subject, objects etc in a sentence.
There is a large lexical resources available in the corpora such as nltk (natural language tool box) originally developed by university of Pennsylvania have a trained models which are able to provide the structures of the sentence as discussed above. to obtain the part of speech called tag posting, we use various functions available in the nltk. Also we can obtain dependency parsing by all ready train models in the tool kit. A dependency parsing is hierarchy of words as shown in the following figure.
Text data pre-processing
A text data contains many errors and noise, that are needed to be removed. some words may be present in the vocabulary, so it is necessary to replace these words with a tag that is understood by trained models present in nltk.
Since, any kind of neural network works on numerical data, therefore, the list of words in a sentence needs to be converted into the list of numerical values, which are nothing but the index of the words present in the dictionary of the corpora.
Bi-direction LSTM Neural Network brief introduction
The bi-direction neural network contains two sets of lstm layer, one in the forward direction and other in the backward direction. a schematic diagram of the Bi-directional neural network is shown in the following figure.
Since in the RNN, the problem of gradient vanishing become severe, therefore, to mitigate that problem a lstm RNN design was proposed. The bi-lstm model was trained and the code and result for entire thing is given in the following section.