Linear Regression; Different Implementations Between Framework And Non Framework

The main objective of this linear regression implementations is to predict the net hourly electrical energy output (EP) of a combined cycle power plant as well as to see the different result of both implementations. The dataset contains 9568 data points collected from a Combined Cycle Power Plant over 6 years (2006–2011), when the power plant was set to work with full load. Features consist of hourly average ambient variables Temperature (T), Ambient Pressure (AP), Relative Humidity (RH) and Exhaust Vacuum (V).

For the correct use of this 2 programs there are certain installations we want to do first:

python3 -m pip install matplotlib
pip3 install scikit-learn
pip3 install pandas
pip3 install wheel

(Those were the only installations I had to make so if you run the program and encounter an issue make sure you install the proper libraries that you are missing by the way as you might see I am using python 3).

For the linear regression implementation by hand first of all noticed that we had the corresponding functions: forward_pass, compute_loss, backward_pass and update_params:

Next we have the training function which it will be very important along for this program to work as expected. First we make a prediction with the forward_pass function, then we can see de approach (how far it went) with the help of the compute_loss function, next with the backward_pass function it helps with the adjustments of the variables to get even closer as we can see in this part of the code:

There is also the need to mention that the get_sets is used to separate the train and test:

For the next part we read the dataset which is lr.csv, then we get the 4 variables that the model has, and we put it in a form that the our class can consume it, then we get the predictions and we put it as well as our class can consume it.

Next, we separate the data within test and train, we put that it is a model with 4 variables (model = LinearModel(4)) then we train the model, here we pass the values of the training as well as the number of iterations (50000) and the error (0.00000001):

Testing: now, we make a prediction of an individual test and the test array we pass an index so it can tell in a certain instance of the data which was the predicted and which one was the real:

So when we actually run the program we get this result in the terminal which I will explain in a moment…

Let's focus in the last part, first when we see next to model it is the value of the firsts variables x1, x2, x3 x4, and to the right is the bias. And finally we get the graph which plots the value against the predictions for each instance:

Also, when we close the graph the program now lets us enter a value which would be an instance so in the follow example we can se that in the instance 12 the prediction value as well as the real value for that instance:

Now, when we implement the linear regression with the scikit framework is another story as we can see when the graph is plot it is much more optimized than the non framework. Here we can see that closer the red and blue dots are of each other the prediction is more precise:

And, like the non framework implementation if we close the graph we can now enter a value (instance) so in this case the prediction is much more closer to the real value than in the non framework implementation:

So, lets make one final comparison between the 2 of them and make our conclusion…

In the left we have the graph ploted by the non framework implementation and on the right we have the graph ploted by the framework.

Values of the framework
Values of the non framework

So, as we can se in the images on the left, the prediction values are closer to the real values with the framework than with the non framework.

And in conclusion we can see that the approach of the linear regression implementation with non framework was quite good even that it was not quite perfect, but in terms of competition the winner is without a doubt the implementation within the framework, given the fact that it is very well optimized in this situations we should choose to use this powerful tool to get the most precise results of our data.

  • References:

https://archive.ics.uci.edu/ml/datasets/Combined+Cycle+Power+Plant