CrossMark

Available online at www.sciencedirect.com

ScienceDirect

Procedía Computer Science 85 (2016) 278 - 285

International Conference on Computational Modeling and Security (CMS 2016)

Improving Efficiency of Fuzzy Models for Effort Estimation by Cascading & Clustering Techniques

Rama Sree Pa*, Ramesh SNSVSCb

aDept. ofCSE, Aditya Engineering College, JNTUK, Kakinada,533437,India bDept. ofCSE, Sri Sai Aditya Institute of Science & TechnologyJNTUK, Kakinada,533437, India

Abstract

The main challenge in software industry is the process of estimating the cost required for the development or maintenance of a project. Various models have been proposed for constructing a relationship between size of software and its development effort. Various algorithmic cost estimation models exists with their own pros and cons for estimation. Now a days, attention has turned towards Machine learning techniques as few of the problems associated with previous models are being addressed by the soft computing techniques. But, the need for accurate effort estimation in software project management is still a challenge. The literature shows the usage of Fuzzy Logic Controller for Software Effort Estimation, but the computational time is very high as the rulebase is large. The main aim is to reduce the rulebase and improve the efficiency by cascading of Fuzzy Logic Controllers. A case study on NASA 93 dataset is taken for this purpose. The Proposed work is carried out by cascading the Fuzzy Logic Controllers in a stage of two and six for Software Effort Estimation. By increasing the cascading of Fuzzy Logic Controllers, the efficiency has been improved with the minimized size in rulebase. The limitation to this process is to find out the correct number of cascaded Fuzzy Logic Controllers. To overcome this a Fuzzy Model using Subtractive Clustering has been proposed. The rulebase of the models developed by using Subtractive Clustering is further reduced. Considering the rule minimization and the various criteria for assessment, Fuzzy Models developed using Subtractive Clustering provides better software development effort estimates.

Keywords: Fuzzy Logic Controller, Effort Estimation, Cascading Fuzzy Logic Controllers, Rulebase, NASA 93 dataset, Subtractive Clustering;

* Corresponding author. Tel.: +919550002682; E-mail address.Tamasree_p@rediffmail.com

1877-0509 © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license

(http://creativecommons.Org/licenses/by-nc-nd/4.0/).

Peer-review under responsibility of the Organizing Committee of CMS 2016

doi:10.1016/j.procs.2016.05.234

1. Introduction

Since the proposal of Fuzzy logic by Lotfi Zadeh in 1965, it was the primary focus of significant investigations. The mathematical tool, Fuzzy Logic, deals with imprecise, uncertain, unclear and vague data. Mapping input space to output space by using if-then statements called rules, is the prime theme of Fuzzy Logic. Various attempts [1, 2] were made to fuzzify a few of the current algorithmic models to handle uncertain and imprecise problems surrounding these models.

The literature shows different types of Fuzzy Logic Models [5,7,8,12,13,14,17,18] for effort estimation[3,6]. The present work focuses on developing Fuzzy Models basing on the Intermediate COCOMO factors [4]. Initially, Fuzzy Models were developed with two inputs (i.e. Mode and Size) [5, 9] and later on with three inputs (i.e. Mode, Size & Effort Adjustment Factor). The rulebase is examined to be high in both the cases. For the Fuzzy Model with two inputs, the rulebase consisted of 75 rules and for the Fuzzy Model with three inputs [11], the rulebase has 375 rules. When the inputs to the Fuzzy Logic Controller (FLC) are increased, the rules needed to train the controller are also increased. If the rulebase is large, for every sample input being passed, all the rules of the FLC have to be checked for finding the fired rules and it is time consuming. The Computational time increases with the increase in rulebase.

For the present work, the aim is to pass 17 input parameters of NASA 93 dataset as inputs to the Fuzzy Logic Controller but stick to minimizing the rulebase. The output of the model is the estimated Effort. In case of a nominal FLC, the rulebase would be very large. Therefore cascading of FLCs is proposed to minimize the rulebase. Firstly, the work is carried out by cascading two FLCs and later on by cascading six FLCs and then examining the rulebase. The output (i.e. estimated effort) is obtained by cascading two and six Fuzzy Logic Controllers. To still explore on the possibility of minimizing the rulebase, Fuzzy Models are developed using Subtractive Clustering. Section 2 presents Cascading Two Fuzzy Logic Controllers, Section 3 presents Cascading Six Fuzzy Logic Controllers and Section 4 presents the development of Fuzzy Model using Subtractive Clustering.

2. Cascading Two Fuzzy Logic Controllers

The proposed Fuzzy Models were developed using the NASA 93 dataset which uses the Intermediate COCOMO equation for calculating the estimated effort. The NASA 93 dataset includes 17 input parameters. The primary inputs are Mode and Size. The secondary inputs are 15 parameters required to calculate EAF (Effort Adjustment Factor). In the cascaded FLC, to the first FLC represented as FLC1, 15 secondary inputs are passed. The output of FLC1 is EAF. To the second Fuzzy Logic Controller represented as FLC2, the primary inputs Mode, Size and the output of the FLC1 i.e. EAF are passed. The output of FLC2 is the Effort. The number of rules used for FLC2 is only 24 and for FLC1 the rulebase includes 3 rules. This reduction in rules helps in decreasing the computational time and improving the response time. By cascading two FLCs, the rulebase is reduced to 27 rules. Fig. 1 shows the Fuzzy Model developed by cascading two FLCs.

Fig. 1. Cascading two Fuzzy Logic Controllers

2.1. Experimental Results of Cascading Two Fuzzy Logic Controllers

The inputs to FLC1 are 15 parameters required to calculate EAF. The output of FLC1 is EAF. The output of FLC2 is the Effort. The NASA 93 dataset is used for experimentation. The entire Fuzzy Model has 27 rules.

The assessment of Models is done using criteria like Variance Accounted For(VAF), Mean Absolute Relative Error(MARE), Variance Absolute relative Error(VARE), Mean Balance Relative Error(Mean BRE), Mean Magnitude of Relative Error(MMRE) and Prediction(Pred)[9,10].

Table 1. Criteria for assessment of Fuzzy Model developed by cascading two FLCs

Model VAF MARE VARE MMRE Pred(30)%

Two FLC's 99.44 16.79 9.24 0.36 33.15 80.6

Table 2. Results of Fuzzy Model developed by cascading two FLCs

S.NO. Project Id Mode Size FLC1 Output FLC2 Output Actual Effort

1 4 1.12 8.2 0.87852 38.9 36

2 93 1.2 3 3.54212 38.0 38

3 26 1.12 10 0.78344 48.3 48

4 35 1.12 14 0.87852 59.1 60

5 27 1.12 15.4 0.96944 66.1 70

6 78 1.12 165 1.17837 97.1 97

7 64 1.05 40 0.53235 150.0 150

8 63 1.05 90 0.53235 162.0 162

9 73 1.12 85 0.91027 311.0 300

10 84 1.2 24 1.15425 430.0 430

11 67 1.2 339 0.86530 444.0 444

12 91 1.2 16.3 3.54212 480.0 480

13 75 1.12 111 0.74234 592.2 600

14 65 1.2 137 0.70999 636.0 636

15 44 1.12 284.7 0.80446 973.0 973

16 56 1.12 227 0.94706 1181.0 1181

17 86 1.2 65 2.77206 1772.5 1772.5

18 46 1.12 423 0.43619 2400.0 2400

19 59 1.12 980 3.68021 4560.0 4560

20 90 1.2 233 2.77206 8211.0 8211

Fuzzy Model

Project S,N0.

Fig. 2. (a) Assessment Criteria of Fuzzy Model withtwo FLCs; (b) Comparison of Actual Effort and FLC2 Output

The criteria for assessment of the Fuzzy Model developed by cascading two FLCs is shown in Table 1. It is observed that VAF is 99.44% and Pred(30) is 80.6%. The results of the Fuzzy Model developed by cascading two FLCs for selective projects are shown in Table 2. Fig. 2(a) shows the assessment criteria of Fuzzy Model with two FLCs and Fig. 2(b) shows the comparison of the actual effort and the estimated effort of the developed Fuzzy Model for selective projects.

3. Cascading Six Fuzzy Logic Controllers

The secondary input parameters of NASA 93 dataset are broadly classified into Product, Platform, Personnel and Project[9]. Product category includes 3 input parameters, Platform category includes 4 input parameters, Personnel category includes 5 input parameters and Project category includes 3 input parameters. To reduce the rulebase, the basic idea is to pass the Product category input parameters to Fuzzy Logic Controller FLC1, pass the Platform category input parameters to Fuzzy Logic Controller FLC2, pass the Personnel category input parameters to Fuzzy Logic Controller FLC3 and pass the Project category input parameters to Fuzzy Logic Controller FLC4. The outputs of FLC1, FLC2, FLC3 and FLC4 are again passed to Fuzzy Logic Controller FLC5. The output of FLC5 is the EAF. The inputs to Fuzzy Logic Controller FLC6 are Mode, Size and EAF (i.e. output of FLC5). The output of Fuzzy Logic Controller FLC6 is the Effort. Fig. 3 shows the Fuzzy Model developed by cascading six FLCs.

Fig. 3. Cascading of six Fuzzy Logic Controllers

3.1. Experimental Results of Cascading Six Fuzzy Logic Controllers

A case study on NASA 93 dataset is presented. The Fuzzy Model developed has six Fuzzy Logic Controllers. The output of FLC1 is the Product data. The output of FLC2 is the Platform data. The output of FLC3 is the Personnel data. The output of FLC4 is the Project data. These four categories are passed to FLC5. The output of FLC5 is EAF. The output of FLC6 is Estimated Effort. The Fuzzy Logic Controller FLC1 has 1 rule, FLC2 has 2 rules, FLC3 has 1 rule, FLC4 has 1 rule, FLC5 has 1 rule and FLC6 has 1 rule in the rulebase. The number of rules for all the FLCs has been reduced to 7. As the rulebase is reduced the computational time is also reduced.

The criteria for assessment of the Fuzzy Model developed by cascading six FLCs is shown in Table 3. It can be observed that, for the Fuzzy Model with six FLCs VAF is 99.45% and Pred(30) is 87.6%. The results of the Fuzzy Model developed by cascading six FLCs for selective projects of NASA 93 dataset is shown in Table 4. Fig. 4(a) shows the assessment criteria of the Fuzzy Model with six FLCs. Fig. 4(b) shows the comparison of actual effort and the estimated effort of the developed Fuzzy Model.

Table 3. Criteria for assessment of Fuzzy Model developed by cascading six FLCs

Model VAF MARE VARE Mean BRE MMRE Pred(30)%

Six FLC's 99.45 14.39 7.24 0.34 19.15 87.6

Table 4. Results of Fuzzy Model developed by cascading six FLCs

s.no. pj Id Mode Size FLC1 FLC2 FLC3 FLC4 FLC5 FLC6 Actual

Output Output Output Output Output Output Effort

1 4 1.12 8.2 1.26 0.75 0.95 0.99 1.01 39.9 36

2 93 1.2 3 1.51 1.97 1.12 1.05 3.35 38.5 38

3 15 1.12 20 0.97 1.01 0.81 1 0.8 48 48

4 18 1.12 31.5 1.26 0.75 0.95 0.99 1.01 62.4 60

5 27 1.12 15.4 1.36 1.44 0.68 0.72 0.95 73 70

6 78 1.12 165 1.51 1.06 0.57 1.3 1.32 96 97

7 64 1.05 40 0.97 0.75 0.64 1.1 0.33 147 150

8 63 1.05 90 0.97 0.75 0.64 1.1 0.33 161.9 162

9 72 1.12 98 1.15 1.06 0.57 1.3 1.05 302 300

10 84 1.2 24 1.44 1.74 0.61 0.81 1.31 429 430

11 38 1.12 90 1.65 0.85 0.64 1 0.74 447 444

12 91 1.2 16.3 1.51 1.97 1.12 1.05 3.35 480 480

13 75 1.12 111 0.89 1.06 0.57 1.3 0.89 604 600

14 65 1.2 137 1.34 0.85 0.64 1 0.51 635 636

15 44 1.12 284.7 1.08 1.06 0.79 0.9 0.77 975 973

16 56 1.12 227 1.75 1.55 0.3 0.9 1.04 1184 1181

17 86 1.2 65 1.93 2.6 0.57 0.95 3.05 1775 1772.5

18 46 1.12 423 0.82 0.75 0.52 1.26 0.25 2400 2400

19 59 1.12 980 2.16 1.56 0.71 1.55 3.23 4560 4560

20 90 1.2 233 1.93 2.60 0.57 0.95 3.05 8211 8211

VAF MARE VAfiE lun BRE UMR£ Pwd(30|

Fig. 4. (a) Assessment Criteria of Fuzzy Model with six FLCs; (b) Comparison of Actual Effort and FLC6 Output

The results of the Fuzzy Model with six FLCs are promising but as the inputs increases, the cascading also increases which would be a tiresome process. As the cascading increases, the results would be better. But however care must be taken to cascade the correct number of FLCs. Therefore, the possibility of reducing the rulebase[16] using other techniques is also examined. The methodology adopted for further reducing the rulebase is by using the Clustering techniques [15]. The Clustering technique adopted here is Subtractive Clustering.

4. Subtractive Clustering

In Data Clustering, data is divided into clusters. A cluster is a set of data points with similarities [15]. Subtractive clustering is one of the better Clustering techniques. Subtractive Clustering technique uses data points as the candidates for cluster centres. This means that the computation is proportional to the problem size instead of the problem dimension. However, the actual cluster centres are not essentially located at one of the data points. But, in majority cases it is a good approximation, especially with the reduced computation this approach introduces. Let n, represent the number of data points. Since each data point is a candidate for cluster centres, a density measure at a data point xi is defined as

where ra is a positive constant representing a neighbourhood radius. Therefore, a data point will have a high density value if it has many neighboring data points. The first cluster centre xc1 is chosen as the point having the largest density value Dc1. Next, the density measure of each data point xi is revised as follows:

where rb is a positive constant which defines a neighbourhood that has measurable reductions in density measure. Therefore, the data points near the first cluster centre xc1 will have significantly reduced density measure. After revising the density function, the point having the greatest density value is selected as the next cluster centre. This process continues until a sufficient number of clusters are obtained.

As the clustering process is completed, the number of clusters is known. Each cluster can now be represented using one rule. In this way, the rulebase could be minimized. The Subtractive Clustering Technique is applied for Effort Estimation and the rulebase is examined.

4.1. Experimental Results Using Subtractive Clustering for Effort Estimation

Fuzzy Model is developed using Subtractive Clustering technique. The "genfis2" function in MATLAB is used for this purpose. The function genfis2 creates a Fuzzy Model using Subtractive Clustering technique. A radii of 1.51 is used for this purpose. For the present case study on NASA 93 dataset with 17 input parameters, the data is divided into 3 clusters. The cluster centers' are mapped to the rules required. The rulebase consists of 3 rules. As, the rules are only 3, there is a good reduction in the number of rules and the computational time is highly reduced and the efficiency is improved.

The criteria for assessment of the Fuzzy Model developed using Subtractive Clustering for the NASA 93 dataset is depicted in Table 5. The Fuzzy Model developed using Subtractive Clustering produced VAF of 99.5% and Prediction of 91.4%. The Effort Estimations of the Fuzzy Model developed using subtractive Clustering for selective

- xj )2

projects is shown in Table 6. Fig 5(a) shows the assessment criteria of the Fuzzy Model developed using Subtractive Clustering. Fig 5(b) shows the comparison of actual effort and the estimated effort of the Fuzzy Model developed using Subtractive Clustering.

Table 5. Criteria for assessment of Fuzzy Model developed using Subtractive Clustering

Model VAF MARE VARE MMRE Pred(30)%

Fuzzy Model 99.5 3.4 27.1 0.11 18.66 91.4

Table 6. Results of Fuzzy Model developed using Subtractive Clustering

Project Id Mode Size Actual Estimated

S.No Effort Effort

1 7 1.12 3.5 10.8 8.98

2 33 1.12 5.5 18 18.70

3 3 1.12 7.7 31.2 29.39

4 13 1.12 11.3 36 36.11

5 26 1.12 10 48 45.49

6 18 1.12 31.5 60 60.32

7 71 1.12 34 72 71.27

8 78 1.12 165 97 95.29

9 49 1.12 21 107 106.47

10 40 1.12 16 114 115.26

11 1 1.12 25.9 117.6 117.95

12 14 1.12 100 215 214.91

13 48 1.12 47.5 252 251.51

14 17 1.12 150 324 323.89

15 55 1.12 50 370 367.76

16 84 1.2 24 430 442.82

17 83 1.2 41 599 586.25

18 88 1.2 50 1924.5 1923.89

19 62 1.2 271 2460 2456.86

20 59 1.12 980 4560 4560.67

100 99.5 Fuzzy Model ■ FuJ/y Mocfi'l

1 50 > 40

JO 27.1

20 10 1 O.ll 18,66 1

VAF MARE VARE llau BRF MMRF- Pr*d(30|

Criteria

Fig. 5. (a) Assessment Criteria of Fuzzy Model with Subtractive Clustering; (b) Comparison of Actual Effort and Estimated Effort

5. Conclusion

In this paper, the application of Fuzzy Logic to Software Effort Estimation is discussed. The dataset used for this purpose is the NASA 93 dataset. The rulebase and the various criteria for assessment of the Fuzzy Models developed by cascading two FLCs and six FLCs are examined. The Fuzzy Models developed by cascading six FLCs provided better software development effort estimates. As the cascading increases, the results would be better. But however care must be taken to cascade the correct number of FLCs. An alternative to this approach is, the development of a Fuzzy Model using Subtractive Clustering. This model have shown a VAF of 99.5% MARE of 3.4,VARE of 27.1, Mean BRE of 0.11, MMRE of 18.66 and a Pred(30)% as 91.4%. Considering the rule minimization and various criteria for assessment of the models, Fuzzy Model developed using Subtractive Clustering provided better software development effort estimates and has only 3 rules. As the rules are minimized, the Computational Time is reduced and hence the efficiency is highly improved. It can be concluded that the Fuzzy Model developed using Subtractive Clustering technique provided accurate software effort estimates.

References

1. Alaa F. Sheta, Estimation of the COCOMO Model Parameters Using Genetic Algorithms for NASA Software Projects, Journal of Computer

Science 2 (2): 118-123, 2006, 118-123.

2. Anupama Kaushik, A. K. Soni, and Rachna Soni, A Comparative Study on Fuzzy Approaches for COCOMO's Effort Estimation, International

Journal of Computer Theory and Engineering, Vol. 4, No. 6, December 2012.

3. Boehm B, Abts C. and Chulani S., Software Development Cost Estimation Approaches - A Survey, University of Southern California Center

for Software Engineering, Technical Reports, USC-CSE-2000- 505, 2000.

4. Boehm B. W.: Software Engineering Economics, Englewood Cliffs, NJ, Prentice-Hall, 1981.

5. Idri A. and Abran A., COCOMO Cost Model Using Fuzzy Logic, 7th International Conference on Fuzzy Theory & Technology, Atlantic City,

New Jersey, March 2000.

6. Jainendra K. Navlakha, Choosing a Software Cost Estimation Model for Your Organization: A Case Study, Elsevier Science Publishers B.V.

(North-Holland), 1990, 255-261.

7. Nisar M. W, Yong-Ji Wang, Elahi M. W., and I.A Khan, Software Development Effort Estimation Using Fuzzy Logic, Information Technology Journal, 2009 Asian Network for Scientific Information, 2009.

8. Pedrycz W, Peters J.F. and Ramanna S, A Fuzzy Set Approach to Cost Estimation of Software Projects, Proceedings of the 1999 IEEE

Canadian Conference on Electrical and Computer Engineering, Shaw Conference Center, Edmonton, Alberta, Canada May 9-12, 1999.

9. Prasad Reddy P.V.G.D, Sudha K.R, Rama Sree P & Ramesh S.N.S.V.S.C, Fuzzy Based Approach for Predicting Software Development Effort, International Journal of Software Engineering, Vol 1, Issue 1, June2010, 1-11.

10. Prasad Reddy P.V.G.D, Sudha K.R, Rama Sree P, Application of Fuzzy Logic Approach to Software Effort Estimation, International Journal of Advanced Computer Science & Applications, Vol 2, Issue 5, May 2011, 87-92, ISSN: 2156-5570.

11. Rama Sree P, Analytical Structure of a Fuzzy Logic Controller for Software Development Effort Estimation in Proceedings of IEEE International Conference on EESCO,2015, pp:1-4.

12. Ritu Agarwal et.al, Efficient Estimation of Software System Using Fuzzy Technique, International Journal of Electronics and Computer Science Engineering, ISSN 2277-1956/V1N3-1006-1012.

13. Roheet Bhatnagar, Vandana Bhattacharjee, Mrinal Kanti Ghose A Proposed Novel Framework for Early Effort Estimation using Fuzzy Logic Techniques, Global Journal of Computer Science and Technology Vol. 10 Issue 14 (Ver. 1.0) November 2010, 66-72.

14. Ryder J, Fuzzy Modeling of Software Effort Prediction in Proceeding of IEEE Information Technology Conference, Syracuse, NY, 1-3 Sept 1998, pp: 53-56.

15. Sudha K.R., Butchi Raju Y., Chandra Sekhar A, "Fuzzy C-Means clustering for robust decentralized load frequency control of interconnected power system with Generation Rate Constraint", International Journal of Electrical Power & Energy Systems, Volume 37, Issue 1, May 2012, Pages 58-66.

16. Vakula V.S., Sudha K.R., "Design of Differential Evolution Algorithm Based Robust Fuzzy Logic Power System Stabilizer Using Minimum Rule-Base", IET Generation, Transmission & Distribution, Volume 6, Issue 2, February 2012, p. 121 - 132.

17. Vishal Sharma and Harsh Kumar Verma Optimized Fuzzy Logic Based Framework for Effort Estimation in Software Development, International Journal of Computer Science Issues, Vol. 7, Issue 2, No 2, March 2010, 30-38.

18. Zhiwei Xu, Taghi M. Khoshgoftaar, Identification of fuzzy models of software cost estimation, Fuzzy Sets and Systems, 2004, 141-163.