CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Engineering 153 (2016) 66 - 70

Procedía Engineering

www.elsevier.com/locate/procedia

XXV Polish - Russian - Slovak Seminar "Theoretical Foundation of Civil Engineering"

The influence of input data standardization method on prediction accuracy of artificial neural networks

Hubert Anysza* Artur Zbiciaka Nabi Ibadova

a Warsaw University of Technology, Faculty of Civil Engineering, ArmiiLudowej16, 00-637 Warsaw, Poland

Abstract

Achieving good results in applying artificial neural networks (ANN) in predicting requires some preparatory works on the set of data. One of them is standardization which is necessary when nonlinear activation function is applied. Basing on predicting completion period of building contracts by multi-layer ANN with error backpropagation algorithm, six different methods of input data standardization were checked in order to determine which allows to achieve the most accurate predictions.

©2016 The Authors.Publishedby ElsevierLtd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of the XXV Polish - Russian - Slovak Seminar "Theoretical Foundation of Civil Engineering".

Keywords: input data standardization; artificial neural networks ANN; building contracts completion date predicting

1. Introduction

Artificial neural networks (ANN) are one of the best utility for predicting values when the real process - in result of which we are getting these values - is complex and we are not sure of the nature of every phenomenon the process consist of [1]. The example of that are delays of completion dates in case of executing building contracts. The reality of the process is another important issue. If we simulate input and output data, the prediction i.e. result of application of ANN, will reflect our intention (occurred during creating the data set). The usefulness of the prediction will be very low in this case. So even for testing purposes real data are preferred. Another approach to

* Corresponding author. Tel.: +48-606-668-288; fax: +48-22-825-74-15.

E-mail address: h.anysz@il.pw.edu.pl

1877-7058 © 2016 The Authors. Published by Elsevier Ltd. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the organizing committee of the XXV Polish - Russian - Slovak Seminar "Theoretical Foundation of Civil Engineering". doi:10.1016/j.proeng.2016.08.081

estimation of completion date of building contracts is fuzzy logic [11]. This paper is based on collected data from ZMID (Zaklad Miejskich Inwestycji Drogowych) - Warsaw municipal company responsible for street building, upgrading and maintenance. Following ANN will be predicting delays in completion date of infrastructure building contracts:

• type of ANN: MLP with error backpropagation algorithm

• 6 input neurons

• 2 hidden layers

• 5 neurons in each hidden layer

• 1 output neuron

• logistic activation function

One of the input data was planned duration of executing given contract. The output - real duration, so basing on these predicted delay of the contract completion date can be calculated. The choice of input - for the purpose of delays prediction - should be based at least on a hunch that a given factor influence the value being predicted. Analysis that chosen as an input:

• value of a contract

• number of subcontractors

• type of general contractor (as a consortium or one company acting solely)

• share of the road construction works in a total value of a contract

• type of a work (a new street construction or upgrading existing street)

• planned duration of work execution

is the right choice can be found [2, 3, 4]. Researches made on this subject [5, 6] confirms the liaison between aforementioned factors and the delay in completion date of building contract execution too. The number of input data sets was only 22, so it was decided to multiply them by changing a bit original input records in the way that numeral values has been slightly changed while non-numeral were remained unchanged. This has made the input data set of 105 records where 6 has been excluded for testing purposes. The procedure of this kind, made on originally too short database, allows smooth applying ANN [9].

2. Prediction errors calculation for different standardization methods applied for ANN input data

Following six standardization methods were applied for original data set: vector - formula (1), Manhattan -formula (2), maximum - formula (3), Weitendorfs linear - formula (4), Peldschus' nonlinear - formula (5), Zavadskas and Turskis' logarithmic - formula (6) [7, 8]. Jüttler-Korth linear standardization was not applied after checking that for non-negative data values it is identical to linear maximum.

Nomenclature

Ai i element of a given data type after standardization

Aoi i element of a given data type before standardization

n number of elements of a given data type (i vary from 1 to n)

APE Absolute Percentage Error

MAPE Mean Absolute Percentage Error

RMSE Root Mean Squared Error

Pi value predicted by ANN based on i record of input data

P0i original (real) value of Pi

2.1. Standardization formulas Vector standardization

At =, Aoi 2 (1)

Manhattan standardization

Ai = ^~fe (2) Maximum linear standardization

Ai = (3)

maxAoi

Weitendorfs linear standardization

^ _ ^o¡-minA0¡ (4)

1 max^Qj-min A0¿

Peldschus' nonlinear standardization

At = 2 (5)

Zavadskas and Turskis' logarithmic standardization

1 ~ inni^ioi (6)

It have to be mentioned that non-numeral data (e.g. consortium vs one company acting solely) were input as 0 or 1 value and were not a subject of standardization. The output has been standardized with the same method as input, so in order to get predicted (by ANN) values in original unit (month in this case) it was necessary to made calculations reverse to standardization.

2.2. Prediction errors

Following three measures of accuracy of ANN predictions were applied for test part of data only:

APE = max (7)

I Poi I

2n_ |P¡-P0¡|

MAPE = P°l (8)

RMSE = P°i)2 (9)

Result of aforementioned error calculation made for ANN based on data standardized in six ways are collected in Table 1.

Table 1. ANN prediction errors evaluation

Method of standardization Abbreviation APE MAPE RMSE

used for figures

Vector Vect 0,666 0,211 0,423

Manhattan Manh 0,324 0,111 0,00121

Maximum linear Max 0,179 0,047 0,031

Weitendorfs linear Weit 0,341 0,128 0,039

Peldschus' nonlinear Peld 0,425 0,164 0,055

Zavadskas and Turskis' Z&T 0,244 0,097 0,00119

Prediction accuracy of ANN based on Manhattan data standardization was so good (using RMSE evaluation method) that it was necessary to increase precision to show the comparison to Zavadskas and Turksis' standardization where accuracy was even better. Showing 4 or 5 digits they were equally good. Figures from 1 to 3 show descending order of accuracy for different methods of prediction error calculations.

0,7 0,6 ct

0,4 0,3 0,2 1 1 Max Z& III *T Manh W N eit Pe 1 ld Ve

Fig. 1. APE for predictions based on different types of standardization

0,25 0,2 |

0,15 Max Z&T Manh Weit Peld Vect

Fig. 2. MAPE for predictions based on different types of standardization

Fig. 3. RMSE for predictions based on different types of standardization (log10 vertical axis scale)

3. Findings

Applying different method for standardization of input data for ANN running, gives different values of accuracy measures. In this case maximum linear standardization applied for input and output data has made APE and MAPE the lowest. In order to achieve the lowest RMSE Zavadskas and Turksis' logarithmic standardization or Manhattan standardization should be applied. The choice of evaluation method should depend on the nature of value being predicted, and on the rule of loss minimizing of the decision maker (when the decision is to be made basing on predictions) [10]. As absolute percentage error and mean absolute percentage error gives the lowest values in case of maximum linear standardization, just this type of data pre-processing should be chosen. Every application of ANN requires making several decisions about type of ANN, number of hidden layers, number of neurons, activation function etc. It has occurred, by this paper, that method of standardization input and output data can substantially influence prediction errors made by artificial neural network. As the other parameters of ANN, standardization methods should be checked and adjusted for given phenomenon we try to predict utilizing artificial neural network.

References

[1] R. Tadeusiewicz, Sieci neuronowe, 1993, Akademicka Oficyna Wydawnicza.

[2] H Anysz, Wpiyw wybranych parametrow i cech kontraktow na roboty budowlane na mozliwosc dotrzymania terminu zakonczenia budowy, Technika Transportu Szynowego., 9/2102 , pp.2127-2134

[3] H. Anysz, M Ksi^zek, Wpiyw cech wlasnych przedsi^biorstwa wykonawcy na mozliwosc dotrzymania terminu zakonczenia budowy, Archiwum Instytutu Inzynierii L^dowej, 13/2012, pp. 29-38

[4] A. Lesniak, E. Plebankiewicz, Opoznienia w robotach budowlanych, Zaeszyty Naukowe WSOWL 3 (157) 2010, pp. 332-339

[5] A. Lesniak, Przyczyny opoznien w opiniach wykonawcow, Technical Transactions, 1-B/2012, pp. 57-68

[6] H. Anysz, A. Zbiciak, Przyczyny powstawania opoznien w realizacji kontraktow budowlanych - analiza wst^pnych wynikow badania ankietowego, Autobusy, 2013 R. 14 nr 3

[7] M. Kaftanowicz, M. Krzeminski, Multiple-criteria Analysis of Plasterboard Systems. 2015, Procedia Engineering 111, pp. 364-370.

[8] E. K. Zavadskas, Z. Turskis, A New Logarithmic Normalization Method in Games Theory. INFORMATICA, 2008, 19 (2), pp. 303-314

[9] S. Osowski, Sieci neuronowe w uj^ciu algorytmicznym, 1997, WNT

[10] A. Zelias, B. Pawelek, S. Wanat, Prognozowanie Ekonomiczne. Teoria, przyklady, zadania., 2013, PWN

[11] N. Ibadov, Fuzzy estimation of activities duration in construction projects, 2015, Archives of Civil Engineering, Vol. 61, Issue 2, pp. 23-34