Sentiment Analysis Implementation

2 minute read

A few months ago I started collecting some dhivehi data for trying out sentiment analysis and so far collected about 800 data points. Which is not much but it was enough to try it out. I was just curious how this is going to turn out. This is mainly done for experimental purposes and I am very new to machine learning.

Sentiment Analyis 101 : Article I wrote about collecting data for this project.

So tonight I tried to train a model with some of the data in the dataset. Most of the data were considered positive so I had to strip out evenly between the labels to make the predictions a bit accurate. It was not perfect.

To train the model I used A high-level machine learning and deep learning library for the PHP called Rubix ML.

Data

Data structured and labeled.

To structure the data this way I made a script based on PHP League CSV Reader

 1<?php
 2
 3require './vendor/autoload.php';
 4
 5use League\\Csv\\Reader;
 6
 7//load the CSV document from a file path
 8$csv = Reader::createFromPath('comments.csv', 'r');
 9$csv->setHeaderOffset(0);
10
11$header = $csv->getHeader();
12$records = $csv->getRecords();
13$num = 0;
14
15foreach ($records as $record) {
16
17    $handle = fopen("data/nuetral/".$num."\_nuet.txt", 'w');
18    fwrite($handle, $record\["comment"\]);
19    fclose($handle);
20    $num++;
21}

Training

To Setup the dataset I am making the folder names as labels and the content inside the .txt file as the sample it corresponds to.

 1use Rubix\\ML\\Other\\Loggers\\Screen;
 2use Rubix\\ML\\Datasets\\Labeled;
 3use Rubix\\ML\\PersistentModel;
 4use Rubix\\ML\\Pipeline;
 5use Rubix\\ML\\Transformers\\TextNormalizer;
 6use Rubix\\ML\\Transformers\\WordCountVectorizer;
 7use Rubix\\ML\\Other\\Tokenizers\\NGram;
 8use Rubix\\ML\\Transformers\\TfIdfTransformer;
 9use Rubix\\ML\\Transformers\\ZScaleStandardizer;
10use Rubix\\ML\\Classifiers\\MultilayerPerceptron;
11use Rubix\\ML\\NeuralNet\\Layers\\Dense;
12use Rubix\\ML\\NeuralNet\\Layers\\Activation;
13use Rubix\\ML\\NeuralNet\\Layers\\PReLU;
14use Rubix\\ML\\NeuralNet\\Layers\\BatchNorm;
15use Rubix\\ML\\NeuralNet\\ActivationFunctions\\LeakyReLU;
16use Rubix\\ML\\NeuralNet\\Optimizers\\AdaMax;
17use Rubix\\ML\\Persisters\\Filesystem;
18use Rubix\\ML\\Datasets\\Unlabeled;
19
20use function Rubix\\ML\\array\_transpose;
21
22ini\_set('memory\_limit', '-1');
23
24$logger = new Screen();
25
26$logger->info('Loading data into memory');
27
28$samples = $labels = \[\];
29
30foreach (\['positive', 'negative', 'neutral'\] as $label) {
31    foreach (glob("data/$label/\*.txt") as $file) {
32        $samples\[\] = \[file\_get\_contents($file)\];
33        $labels\[\] = $label;
34    }
35}
36
37$dataset = new Labeled($samples, $labels);
38
39$estimator = new PersistentModel(
40    new Pipeline(\[
41        new TextNormalizer(),
42        new WordCountVectorizer(10000, 3, 10000, new NGram(1, 2)),
43        new TfIdfTransformer(),
44        new ZScaleStandardizer(),
45    \], new MultilayerPerceptron(\[
46        new Dense(100),
47        new Activation(new LeakyReLU()),
48        new Dense(100),
49        new Activation(new LeakyReLU()),
50        new Dense(100, 0.0, false),
51        new BatchNorm(),
52        new Activation(new LeakyReLU()),
53        new Dense(50),
54        new PReLU(),
55        new Dense(50),
56        new PReLU(),
57    \], 256, new AdaMax(0.0001))),
58    new Filesystem('sentiment.model', true)
59);
60
61$estimator->setLogger($logger);
62
63$estimator->train($dataset);
64
65$scores = $estimator->scores();
66$losses = $estimator->steps();
67
68Unlabeled::build(array\_transpose(\[$scores, $losses\]))
69    ->toCSV(\['scores', 'losses'\])
70    ->write('progress.csv');
71
72$logger->info('Progress saved to progress.csv');
73
74if (strtolower(trim(readline('Save this model? (y|\[n\]): '))) === 'y') {
75    $estimator->save();
76}

Prediction

 1use Rubix\\ML\\PersistentModel;
 2use Rubix\\ML\\Persisters\\Filesystem;
 3
 4ini\_set('memory\_limit', '-1');
 5
 6$estimator = PersistentModel::load(new Filesystem('sentiment.model'));
 7
 8while (empty($text)) $text = readline("Enter some text to analyze:\\n");
 9
10$prediction = $estimator->predictSample(\[$text\]);
11
12echo "The sentiment is: $prediction" . PHP\_EOL;

Here is some prediction is made. It is still not that accurate. I will work on improving it and the dataset. I am still getting into machine learning myself and this project is mainly for me to learn these things and so far I am really happy with the results and how it turned out.

Github: sentiment-analysis-implementaion