Scholarly article on topic 'Predictive modelling of the LD50 activities of coumarin derivatives using neural statistical approaches: Electronic descriptor-based DFT'

Predictive modelling of the LD50 activities of coumarin derivatives using neural statistical approaches: Electronic descriptor-based DFT Academic research paper on "Chemical sciences"

CC BY-NC-ND
0
0
Share paper
OECD Field of science
Keywords
{Predicted / QSAR / MLR / ANN / "Learning algorithm" / Levenberg–Marquardt}

Abstract of research paper on Chemical sciences, author of scientific article — Rachid Hmamouchi, Majdouline Larif, Samir Chtita, Azeddine Adad, Mohammed Bouachrine, et al.

Abstract A study of structure–activity relationship (QSAR) was performed on a set of 30 coumarin-based molecules. This study was performed using multiple linear regressions (MLRs) and an artificial neural network (ANN). The predicted values of the antioxidant activities of coumarins were in good agreement with the experimental results. Several statistical criteria, such as the mean square error (MSE) and the correlation coefficient (R), were studied to evaluate the developed models. The best results were obtained with a network architecture [8-4-1] (R =0.908, MSE=0.032), activation functions (tansig–purelin) and the Levenberg–Marquardt learning algorithm. The model proposed in this study consists of large electronic descriptors that are used to describe these molecules. The results suggested that the proposed combination of calculated parameters may be useful for predicting the antioxidant activities of coumarin derivatives.

Academic research paper on topic "Predictive modelling of the LD50 activities of coumarin derivatives using neural statistical approaches: Electronic descriptor-based DFT"

mmêêifê^m article in press

Available online atwww.sciencedirect.com

ScienceDirect

Journal of Taibah University for Science xxx (2015) xxx-xxx

Review Article

of Taibah University

for Science ournal

www.elsevier.com/locate/jtusci

Predictive modelling of the LD50 activities of coumarin derivatives using neural statistical approaches: Electronic descriptor-based DFT

Rachid Hmamouchia, Majdouline Larif b, Samir Chtitaa, Azeddine Adada, Mohammed Bouachrinec, Tahar Lakhlifia'*

a Molecular Chemistry and Natural Substances Laboratory, Faculty of Science, University Moulay Ismail, Meknes, Morocco b Separation Process Laboratories, Faculty of Science, University Ibn Tofail, Kenitra, Morocco c ESTM, University Moulay Ismail, Meknes, Morocco

Received 16 February 2015; received in revised form 9 June 2015; accepted 10 June 2015

Abstract

A study of structure-activity relationship (QSAR) was performed on a set of 30 coumarin-based molecules. This study was performed using multiple linear regressions (MLRs) and an artificial neural network (ANN). The predicted values of the antioxidant activities of coumarins were in good agreement with the experimental results. Several statistical criteria, such as the mean square error (MSE) and the correlation coefficient (R), were studied to evaluate the developed models. The best results were obtained with a network architecture [8-4-1] (R = 0.908, MSE = 0.032), activation functions (tansig-purelin) and the Levenberg-Marquardt learning algorithm. The model proposed in this study consists of large electronic descriptors that are used to describe these molecules. The results suggested that the proposed combination of calculated parameters may be useful for predicting the antioxidant activities of coumarin derivatives.

©2015 The Authors. Production and hosting by Elsevier B.V. on behalf of Taibah University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Keywords: Predicted; QSAR; MLR; ANN; Learning algorithm; Levenberg-Marquardt

Contents

1. Introduction..................................................................................................00

2. Material and methods.........................................................................................00

2.1. Materials..............................................................................................00

2.2. Methods ............................................................................................... 00

* Corresponding author. Tel.: +212 661996170; fax: +212 535536808. E-mail address: tahar.lakhlifi@yahoo.fr (T. Lakhlifi). Peer review under responsibility of Taibah University.

http://dx.doi.org/10.1016/j.jtusci.2015.06.013

1658-3655 © 2015 The Authors. Production and hosting by Elsevier B.V. on behalf of Taibah University. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).

^ article in press

2 R. Hmamouchi et al. /Journal ofTaibah University for Science xxx (2015) xxx-xxx

2.2.1. Theoretical calculations for the molecular modelling...............................................00

2.2.2. Multiple linear regressions.......................................................................00

2.2.3. Artificial Neural Networks (ANNs)...............................................................00

3. Results and discussion.........................................................................................00

3.1. Multiple linear regressions (MLR)........................................................................00

3.2. Multiple nonlinear regression of the variable antioxidant activity (MNLR)....................................00

3.3. Artificial neural networks: PMC type.....................................................................00

3.4. Choice of the number of hidden layers....................................................................00

3.5. Choice of transfer functions and the number of iterations...................................................00

4. Conclusion ................................................................................................... 00

Acknowledgements ........................................................................................... 00

References ................................................................................................... 00

1. Introduction

Medicinal plants are both finished products that are destined for consumption and raw material that are used for the production of active substances; they are a source of considerable value for many people and have many therapeutic qualities that have been demonstrated by experience. Coumarins are an important class of these natural products and have a characteristic odour similar to that of freshly mown hay. Coumarins are derived from the metabolism of phenylalanine via cinnamic acid, which can be found in all parts of the plant including the fruits and in the essential oils of seeds [1].

Several studies shown that the coumarins are biologically active molecules that express varied activities. Coumarins can prevent the peroxidation of membrane lipids and capture hydroxyl radicals, superoxides and peroxyls [2]. Coumarins have been shown to be effective in blocking cancer chemically induced by ultraviolet radiation (i.e., anticancer activity). Degree and his team have shown that coumarins paralyze the growth of Saccharomyces cerevisiae. Coumarins also have other biological activities, including anti-platelet aggregation [3], anti-inflammatory [4], anticoagulant [5], antitumor [6], diuretic [7], antimicrobial [8], antiviral and analgesic effects [9].

Antioxidants are now manufactured as essential candidates for fighting against several diseases [10], and much current research has converged on the design and development of new chemical entities with potential antioxidant activities.

Coumarins are a natural source of essential antiox-idants; these molecules exhibit activities against free radicals in human tissue via a variety of mechanisms that primarily rely on their structural equivalence with flavonoids and benzophenones [11,12].

The quantitative structure-activity relationship (QSAR) technique [13] has been widely used for years

to provide quantitative analyses of the relationships of the structures and biological activities of compounds.

Almost all QSAR techniques are based on the use of molecular descriptors, which are numerical series that codify useful chemical information and enable the identification of correlations between statistical and biological properties [14,15]. Different QSAR studies from different research groups have identified important structural features that are responsible for activities [16,17] and aided the development of toxicity models for diverse chemicals [18-21].

Applications of ANNs in the QSAR analyses of the biological potentials (cytotoxicity, binding affinity, enzyme inhibition, etc.) of different compounds have been presented in previous papers [22-24]. In those papers, the usefulness of the ANN methodology in the QSAR modelling of the complex input-output relationship has been confirmed. These complex relationships are usually relevant to the prediction of biological activities that depend on many factors (e.g., stereochemistry, lipophilicity, functional groups, the type of organism/cell).

In the present work, we relied on a series of 30 coumarin derivatives studied by Andre Kimura et al. with the aim of developing a predictive QSAR model for the antioxidant activities of the coumarin molecules using calculation methods based on quantum chemistry, molecular structure, molecular geometry, the nature of molecular orbitals and molecular properties.

The more relevant molecular properties were calculated. These properties included the highest occupied molecular orbital energy (EHOMO), the lowest unoccupied molecular orbital energy (ELUMO), the energy gap, the dipole moment (^), the total energy (ET), the activation energy (Ea), the absorption maximum (Xmax) and the factor of oscillation f(SO)).

We developed a neural model for the prediction of changes in antioxidant activity based on electronic

^ ARTICle in press

R. Hmamouchi etal. / Journal ofTaibah University for Science xxx (2015) xxx-xxx

variables, and we show that the best performing model in terms of such predictions is a model that employs transfer functions, the tansig function in the hidden layer and the purelin function in the output layer while using a LM learning algorithm and a PMC-type architecture [8-4-1].

2. Material and methods

2.1. Materials

Andre Kimura Okamoto et al. measured the inhibitory activities (LD50) of series of 30 coumarin molecules against quinolin mutagenicity in Salmonella typhimurium. The following figure illustrates the chemical structures of the studied compounds and their corresponding experimental LD50 activities (Fig. 1).

The experimental toxicities of the studied compounds have been reported in recent work. The range of antiox-idant activities varied from 6.07 to 8.03.

2.2. Methods

2.2.1. Theoretical calculations for the molecular modelling

Quantum chemistry finds its place among today's scientific and technological developments as a powerful method for searching for what is supported by experience, and the development of computer technology will only support this trend. GaussView (03) is one of a very large number of molecular modelling software products used in both research and industry.

Density functional theory (DFT) methods were used in this study. These methods have become very popular in recent years because they can achieve precision levels similar to those of other methods in less time and at less cost from the computational perspective. In agreement with the DFT results, the energy of the fundamental state of a polyelectronic system can be expressed through the total electronic density; in fact, the use of the electronic density rather than the wave function for calculating the energy constitutes the fundamental basis of DFT [25,26] and involves the use of the B3LYP functional [27,28] and a 6-31G* basis set. The B3LYP is a version of the DFT method that uses Becke's three-parameter functional (B3) and includes a mixture of HF and DFT exchange terms that are associated with the gradient-corrected correlation functional of Lee, Yang and Parr (LYP). The geometries of all of the species under investigation were determined by optimizing all of the geometrical variables without any symmetry constraints.

2.2.2. Multiple linear regressions

The multiple linear regression statistical technique is used to study the relation between one dependent variable and several independent variables. Multiple linear regression is a mathematical technique that minimizes the differences between actual and predicted values. The multiple linear regression model (MLR) [29-31] was generated using the software SYSTAT, version 12 [32] to predict the antioxidant activity LD50s. Multiple regression was also used to select the descriptors for use as the input parameters of a back-propagation artificial neural network (ANN).

2.2.3. Artificial Neural Networks (ANNs)

The ANN analysis was performed by applying the Neural Fitting tool (nntool) toolbox of MATLAB software (v 2014a) to a data set of the antioxidant activities of coumarin derivatives [37].

Artificial neural networks are non-linear empirical models [33,34] that are rarely used in the prediction of biological activities, while their applications are rapidly growing in many disciplines. ANNs are among the interesting alternatives to traditional statistics for data processing. In this work, we explain the key concepts of RNA and the multi-layer perception with a greater focus on the latter concept.

2.2.3.1. Architectures of neural networks. Typically, a neural network is defined by its architecture, and such architectures are characterized by the transfer function and how the interconnections between neurons are made. There are several transfer functions, and the selection of a transfer function depends on the problem being solved. Transfer functions are also based on the ease of their implementation and their derivation, which are involved in optimization algorithms.

In our case, the selected network was a multilayer network. This choice was based on the ease and speed of construction and the fact that our problem has a limited number of input variables [35,36].

2.2.3.2. Multilayer perceptron (PMC). The PMC is a layer propagation network model (Fig. 2). The neurons are organized in layers, i.e., an input layer, an output layer and one or more intermediate layers, which are also called hidden layers.

Although in theory a PMC can have multiple layers, in practice, single hidden layers are sufficient [38]. A PMC was established to select the transfer functions, to identify the relevant inputs and the number of neurons in the hidden layer and to select an algorithm and then optimize and test the network.

шяшшшиш al mille in press

R. Hmamouchi et al. / Journal of Taibah University for Science xxx (2015) xxx-xxx

Fig. 1. Chemicals structures of the studied coumarins.

111 WPffilf l Iiiile in press

R. Hmamouchi etal. / Journal of Taibah University for Science xxx (2015) xxx-xxx

Fig. 2. Multilayer perception [4-5-1].

Transfer functions:

Neural networks are used for the approximation of non-linear models. Nonlinearity is introduced by the selected transfer function, particularly in the nodes of the hidden layer. The transfer of the output layer is a linear function. Although in theory any nonlinear function can be used, the functions that are typically selected are generally those that are easy to calculate and drift.

According to Dawson and Wilby [39], the log-sigmoid transfer function (logsig) is the most used and is defined as follows: 1

f (x) =- x Bounded between 0 and 1

1 + e x

Among these functions, those that we use in the context of this work are primarily the linear transfer function (purelin), which is most frequently used in hydrological modelling, and the hyperbolic tangent sigmoid transfer function (tansig) (Fig. 3).

*Purelin (linear transfer function): the purelin is a neural transfer function. Transfer functions calculate a layer's output based on its net input.

*Tansig (hyperbolic tangent sigmoid transfer function): the tansig is a neural transfer function. Transfer functions calculate a layer's output based in its net input.

3. Results and discussion

In this study, we focused on a series of 30 coumarin derivatives to determine the quantitative relationships between the structures of these derivatives and the biological activity LD50 values. In this section, we employ the same approach that we have already used in previous works [29,31].

Table 1 shows the values of the calculated parameters obtained from the optimized structures via DFT/B3LYP 6-31G (d) optimization.

3.1. Multiple linear regressions (MLR)

Many attempts have been made to develop a relationship with the indicator variable of the toxicity LD50, but the best relationship that we obtained with this method

was the one that corresponded to a linear combination of several descriptors, i.e., the total energy ET, the energy EHoMo, the energy ELUMo, the activation energy Ea, the dipole moment f, the absorption maximum Xmax and the factor of oscillation f(So).

LD50 = -19.563 - 4.056 x 10-4 x Et - 8.712

x 10-3 x Ehomo + 0.507 x 10-2 x Elumo + 3.297 x 10-2 x f + 4141.438 x Ea + 3.821 x 10-2 xVax +1.531 xf(so) (1)

For our 30 compounds, the correlation between the experimental and calculated toxicities based on this model were quite significant (Fig. 4) as indicated by the following statistical values:

N = 30 R = 0.637 RMSE = 0.408

R2 = 0.406

Fig. 4 illustrates the very regular distribution of tox-icity values that depended on the experimental values.

3.2. Multiple nonlinear regression of the variable antioxidant activity (MNLR)

We also used nonlinear regression model technique to quantitatively improve the structure-activity relationships by accounting for several parameters. MNLR is the most commonly used tool for the study of multidimensional data. The resulting equation was:

LD50 = -23933.109 - 2.812 x 10-3 x Et + 0.187

x Ehomo - 9.337 x Elumo - 0.507 x f

+ 4141.438 x Ea +49.468 x W - 1.981

x f(So) - 6.201 x 10-7 x E2t - 0.39

x eHomo - 1.179 x eLumo + 0.509 x gap2

+ 3.624 x 10-2 x f2 - 268.072 x E2a

- 3.827 x 10-2 x kiax + 9.177 x fs20 (2)

The obtained parameters describing the electronic aspects of the studied molecules were as follows:

N = 30 R = 0.755 RMSE = 0.451

R2 0.571

^ article in press

6 R. Hmamouchi et al. /Journal ofTaibah University for Science xxx (2015) xxx-xxx

Table 1

Values of the obtained parameters by DFT/B3LYP 6-31G (d) optimization of the Studied compounds.

Molec. DL50 ET (Ua) Ehomo (eV) Elumo (eV) Gap (eV) ß (D) Ea (eV) Vax (nm) f(so)

1 7.13 -591.69 -6.34 -1.69 4.65 6.54 4.22 293.46 0.14

2 8.03 -1071.63 -6.38 -1.52 4.86 5.29 4.00 309.97 0.14

3 6.75 -3257.86 -6.83 -2.02 4.81 5.10 3.75 330.55 0.15

4 7.84 -3182.65 -7.48 -1.83 5.65 7.24 4.06 305.68 0.04

5 6.21 -497.02 -7.46 -1.88 5.59 4.82 4.18 296.27 0.11

6 6.14 -647.45 -6.03 -1.62 4.41 7.38 3.89 318.57 0.02

7 7.18 -670.31 -7.14 -1.27 5.87 6.88 3.80 326.52 0.37

8 6.07 -726.06 -7.62 -1.65 5.96 6.22 4.02 308.08 0.12

9 6.83 -647.46 -5.88 -1.68 4.20 7.17 3.90 318.12 0.22

10 7.30 -650.87 -9.22 -1.63 7.59 6.64 4.16 298.39 0.37

11 6.40 -800.08 -6.81 -2.25 4.55 4.09 3.48 356.75 0.19

12 6.11 -801.28 -7.20 -1.51 5.68 5.80 3.74 331.34 0.09

13 7.00 -611.55 -6.09 -1.66 4.44 6.58 4.16 297.70 0.35

14 6.72 -572.25 -6.93 -1.59 5.33 4.53 4.35 285.27 0.11

15 6.93 -686.76 -8.89 -1.50 7.39 7.93 4.10 302.60 0.21

16 6.90 -690.20 -7.62 -1.37 6.26 5.82 4.20 295.00 0.31

17 6.85 -686.76 -8.72 -1.49 7.23 8.13 4.12 301.11 0.23

18 6.38 -686.76 -9.19 -1.59 7.60 7.66 3.83 324.02 0.04

19 6.77 -726.07 -7.64 -1.44 6.20 6.05 4.12 301.12 0.27

20 6.62 -536.34 -6.92 -1.82 5.10 5.26 4.05 306.16 0.10

21 6.41 -536.34 -6.90 -1.80 5.10 5.04 4.16 297.76 0.16

22 6.44 -992.08 -7.45 -2.28 5.17 4.48 4.06 305.38 0.19

23 6.61 -765.38 -7.21 -1.51 5.70 4.70 3.98 311.42 0.23

24 6.92 -686.78 -8.90 -1.57 7.33 4.29 s3.83 323.76 0.16

25 6.76 -611.56 -6.11 -1.58 4.53 6.72 4.23 293.34 0.27

26 6.52 -726.06 -7.64 -1.63 6.01 4.63 3.94 314.38 0.24

27 6.70 -686.76 -8.05 -1.70 6.35 4.20 3.97 312.23 0.21

28 6.54 -801.28 -7.24 -1.49 5.75 6.06 3.94 314.74 0.02

29 6.12 -840.58 -6.40 -1.53 4.87 5.21 4.00 309.71 0.15

30 6.85 -572.24 -6.92 -1.70 5.22 6.42 4.20 294.85 0.29

The LD50 value predicted by this model is somewhat similar to the observed value. Fig. 5 displays a very regular distribution of the activity values based on the observed values.

The coefficient of correlation obtained from Eq. (2) is quite interesting (0.571). To optimize the standard deviation of the error and complete our model, we employed artificial neural networks (ANNs) in the next section.

As a part of this conclusion, we can state that the toxicity values obtained by nonlinear regression were

highly correlated with the toxicity results obtained with the MLR method.

3.3. Artificial neural networks: PMC type

Concerning the classification or prediction of the antioxidant activities of coumarins, the learning of the PMC occurs in a supervised manner; thus, the ranking variable or the variable to be predicted must be known. In the case of the estimation of antioxidant activities, the

/ My....

a = purelin(n)

Fig. 3. Graph and symbol of purelin and tansig.

^ article in press

R. Hmamouchi etal. / Journal ofTaibah University for Science xxx (2015) xxx-xxx

■¡2 7

• • ^^

•m •V»......* • •

•...................* ......................••. • • •

prédictif toxicity

• • é •• n- ;

• «6 f .! 't • 7

• • •

Calculated DL50 Activity

^ 1 ■u 1

Fig. 4. Relationship between the estimated values of DL50, their predictions and their residues established by (MLR).

•• •

• • • • „

0 . 1 é «.7 0 • 7 5 8

••

Calculated DL50 Activity

Fig. 5. Relationship between the estimated values of DL50, their predictions and their residues established by (NMLR).

collections to be observed are those for which we have this information.

The determination of the type of architecture, i.e., a PMC-time neural network, raises the questions of the selection of the number of hidden layers, the number of hidden neurons, the number of iterations and the transfer functions. To answer these questions, we randomly divided our database into three parts: 70% for training, 15% for testing and 15% for validation.

3.4. Choice of the number of hidden layers

Table 2 presents the calculations for the R and MSE values for one, two, three and four hidden layers.

Increases in the number of hidden layers increased the load calculations without any increment in performance. Therefore, we ensured that the use of a single hidden layer was preferable for the PMC model type.

Table 2

Performance of the system according to the number of hidden layers.

Number of hidden layers MSE (10-2) R

1 2 1.483 0.884

2.572 0.762

3 4.788 0.711

4 5.74 0.581

3.5. Choice of transfer functions and the number of iterations

In this study, we used Levenberg-Marquardt (LM) algorithm as the learning algorithm because is qualified for high performance.

In this case, we changed the number of neurons in the hidden layer and the pairs of transfer functions. The performance was evaluated via the mean squared error (MSE) and the correlation coefficient (R).

Table 3 displays the observed performances for various combinations of torque transfer.

Fig. 6 shows the variation in the mean squared error (MSE) according to the pair of transfer functions for the Levenberg-Marquardt algorithm (LM).

The results in bold in Table 3 indicated that the torque transfer functions (i.e., tansig and purlin) produced a correlation coefficient of R = 0.908 and a mean square error of MSE = 2.93 x 10-2 with a network architecture [8-4-1]. With this configuration, we achieved better performance of the LM learning algorithm, and this performance was achieved after six iterations.

Based on these results, we state that the most powerful model for predicting the activity of the antioxidant coumarin was the model that used the tansig transfer function in the hidden layer and the purelin function in the output layer with an LM learning algorithm and a

JTUSCI-218; No. of Pages 11 article in press

8 R. Hmamouchi et al. /Journal ofTaibah University for Science xxx (2015) xxx-xxx

Table 3

Transfer functions torques according to their performance.

Appellation Hidden layer function output layer function R MSE (10-2) Number of iterations Architecture

T-T Tansig Tansig 0.68 16.6 8 [8-4-1]

T-L Tansig Logsig 0.558 15.23 15 [8-5-1]

T-P Tansig Logsig Purlin 0.908 2.93 6 [8-4-1]

L-L Logsig Logsig 0.531 44.02 7 [8-5-1]

L-T Logsig Tansig Purlin 0.722 9.2 7 [8-7-1]

L-P Purlin Purlin 0.815 17.01 8 [8-9-1]

P-P P-L Purlin 0.494 4.79 12 [8-8-1]

Purlin Logsig 0.607 11.2 18 [8-2-1]

P-T Tansig 0.556 50.86 8 [8-10-1]

Fig. 6. MSE Variation with transfer for couples (LM) algorithm.

Fig. 7. The architecture of a PMC to 8 input variables, four neurons in the hidden layer and one neuron to the output layer.

^ ARtIcle in press

R. Hmamouchi etal. / Journal ofTaibah University for Science xxx (2015) xxx-xxx

n 7'0 ■a

S 6.5 ■O

6 6.5 7 7.5

prédictif Activity

-0.1 -0.2 -0.3

• ,c • •

5*"* • 7 5 !

* • • «

• •

Calculated DL50 Activity

Fig. 8. Relationship between the estimated values of DI50, their predictions and their residues established by (ANN).

PMC configuration deviation [8-4-1] and contained three layers (Fig. 7) as follows:

• 8 neurons of the grafted layer, which represent electronic independent variables;

• 4 neurons in the hidden layer; and

• one neuron of the output layer that represents the antioxidant activity of the coumarin.

The ANN-calculated activity models were developed using the properties of several studied compounds. The correlation between the ANN-calculated and experimental activity values were are very significant as indicated by the R and R2 values.

N = 30 R = 0.908 R2 = 0.811 RMSE = 0.032

Table 4

The observed and calculated values of DL50 by different methods with their residues.

N° of compound DL50 (calc.)

DL50 (obs.) MLR Residu. NMLR Residu. ANN Residu.

1 7.13 6.698 0.432 6.682 0.448 6.596 0.534

2 8.03 6.766 1.264 7.253 0.777 7.939 0.091

3 6.75 7.295 -0.545 7.056 -0.306 6.761 -0.011

4 7.84 7.415 0.425 7.531 0.309 7.970 -0.130

5 6.21 6.437 -0.227 6.338 -0.128 6.646 -0.436

6 6.14 6.364 -0.224 6.447 -0.307 6.249 -0.109

7 7.18 7.049 0.131 6.765 0.415 7.297 -0.117

8 6.07 6.576 -0.506 6.395 -0.325 6.686 -0.616

9 6.83 6.637 0.193 6.622 0.208 6.851 -0.021

10 7.30 7.068 0.232 7.373 -0.073 7.260 0.040

11 6.40 6.214 0.186 6.375 0.025 6.473 -0.073

12 6.11 6.485 -0.375 6.306 -0.196 6.232 -0.122

13 7.00 6.985 0.015 7.224 -0.224 6.927 0.073

14 6.72 6.763 -0.043 6.714 0.006 6.634 0.086

15 6.93 6.897 0.033 6.811 0.119 6.716 0.214

16 6.90 7.125 -0.225 7.036 -0.136 7.058 -0.158

17 6.85 6.957 -0.107 7.096 -0.246 6.760 0.090

18 6.38 6.428 -0.048 6.390 -0.010 6.277 0.103

19 6.77 6.974 -0.204 6.870 -0.100 7.023 -0.253

20 6.62 6.364 0.256 6.153 0.467 6.553 0.067

21 6.41 6.553 -0.143 6.318 0.092 6.614 -0.204

22 6.44 6.448 -0.008 6.500 -0.060 6.422 0.018

23 6.61 6.761 -0.151 6.887 -0.277 6.641 -0.031

24 6.92 6.520 0.400 6.614 0.306 6.851 0.069

25 6.76 6.961 -0.201 7.071 -0.311 6.776 -0.016

26 6.52 6.665 -0.145 6.486 0.034 6.754 -0.234

27 6.70 6.579 0.121 6.707 -0.007 6.717 -0.017

28 6.54 6.481 0.059 6.637 -0.097 6.496 0.044

29 6.12 6.669 -0.549 6.786 -0.666 6.070 0.050

30 6.85 6.896 -0.046 6.585 0.265 6.938 -0.088

^ article in press

10 R. Hmamouchi et al. /Journal ofTaibah University for Science xxx (2015) xxx-xxx

These values that indicate the relationship between the estimated LD50 values and their residues as established with artificial neural networks are illustrated in Fig. 8.

The obtained squared correlation coefficient R value was 0.908 for this data set of coumarins. This finding confirms that the artificial neural network results were optimal for building the quantitative structure-activity relationship model. Next, we investigated the best linear QSAR regression equations established in this study. Based on the results, a comparison of the qualities of the MLR and ANN models revealed that the ANN models exhibited substantially better predictive capabilities because the ANN approach provided better results than the MLR approach. The ANN was able to establish satisfactory relationships between the electronic descriptors and the activities of the studied compounds.

4. Conclusion

In this work, we applied QSAR regression to predict the activities of several antioxidant compounds that are based on coumarins.

The results revealed that the relationship between the antioxidant activities and the other electronic parameters of the molecules were not linear for the coumarins.

The Levenberg-Marquardt algorithm exhibited better performance in terms of statistical indicators and network architecture [8-4-1] when a non-linear activation function of the tansig type was used in the hidden layer and a linear activation function of the purelin type was used in the output layer. This configuration resulted in very good predictions of the antioxidant activities.

Comparisons of the key statistical terms, such as R and R2, of the different models that involved the use of different statistical tools and various electronic descriptors are illustrated in Table 4.

Acknowledgements

We are grateful to the Association Marocaine des Chimistes Théoriciens (AMCT) for its pertinent help concerning the programs.

References

[1] J.L. Guignard, Abrégé de botanique, Masson, Paris, 1998, pp. 212.

[2] C.M. Anderson, A. Hallberg, T. Hogberg, Advances in the developpement of pharmaceutical antioxidant drug, Food Chem. 28 (1996) 65-180.

[3] R.J. Ochocka, D. Rajzer, H. Kowalski, Lamparczyk, Determination of coumarins from Chrysanthemum segetum L. by capillary electrophoresis, J. Chromatogr. A 709 (1995) 197-202.

[4] G. Taguchi, S. Fujikawa, T. Yazawa, R. Kodaira, N. Hayashida, M. Shimosaka, M. Okazaki, Scopoletin uptake from culture medium and accumulation in the vacuoles after conversion to scopolin in 2.4-D-treatred tobacco cells, Plant Sci. 151 (2000) 153-161.

[5] T. Ojala, S. Rames, P.Haansu, H. Vuorela, R. Hiltunen, K. Haahtela, P.Vuerela, Antimicrobial activity of some coumarin containing herbal plants growing in Finland, J. Enthopharmacol. 73 (2000) 299-305.

[6] C.N. Chen, M.S. Weng, C. Wu, J.k. Lin, Comparison of radical scavenging activity, cytotoxic effects and apoptosis induction in human melanosoma cells, Food Chem. 1 (2) (2004) 175-185.

[7] I. Khan, M.V. Kulkari, M. Gopal, ShahabuddinF M.S., Synthesis and biological evaluation of novel angulary fused polycyclic coumarins, Bioorg. Med. Chem. Lett. 15 (2005) 3584-3587.

[8] B. Thati, A. Noble, R. Rowan, S.B. Creaven, M. Walsh, d. Egan, K. Kavanagh, Mechanism of action of coumarin and silver coumarin complexes against the pathogenic yeast Candida albicans, Toxicol. In Vitro 21 (2007) 801-808.

[9] T. Stefanova, N. Nikolova, A. Michailova, I. Mitov, i. Iancovi, g.I. Zlabinger, H. Neychev, Enhanced resistance to Salmonella enterica sero var typhimurium infection in mice after coumarin treatment, Microb. Infect. 9 (2007) 7-14.

[10] R.L.L. De Compadre, A.K. Debnath, A.J. Shusterman, C. Hansch, LUMO energies and hydrophobicity as determinants of mutagenicity by nitroaromatic compounds in Salmonella typhimurium, Environ. Mol. Mutagen. 15 (1) (1990) 44-55.

[11] J.S. Felton, M.G. Knize, F.T. Hatch, M.J. Tanga, M.E. Colvin, Heterocyclic amine formation and the impact of structure on their mutagenicity, Cancer Lett. 143 (1999) 127-134.

[12] U. Maran, M. Karelson, A.R. Katritzky, A comprehensive QSAR treatment of the genotoxicity of heteroaromatic and aromatic amines, Quant. Struct.-Act. Relatsh. 18 (1) (1999) 3-10.

[13] C. Hansch, R.M. Muir, T. Fujita, P.P. Maloney, F. Geiger, M. Streich, J. Am. Chem. Soc. 85 (1963) 2817-2825.

[14] H. González-Díaz, S. Vilar, L. Santana, E. Uriarte, Medicinal chemistry and bioinformatics - current trends in drugs discovery with networks topological indices, Curr. Top. Med. Chem. 7 (10) (2007)1015-1029.

[15] R. Concu, G. Podda, F.M. Ubeira, H. González-Díaz, Review of QSAR models for enzyme classes of drug targets: theoretical background and applications in parasites, hosts, and other organisms, Curr. Pharm. Des. 16 (24) (2010) 2710-2723.

[16] A. Sabljic, QSAR models for estimating properties of persistent organic pollutants required in evaluation of their environmental fate and risk, Chemosphere 43 (2001) 363.

[17] A. Sabljic, H. Gusten, H. Verhaar, J. Hermens, QSAR modelling of soil sorption. Improvements and systematics of logKoc vs. logP correlations, Chemosphere 31 (1995 4489-4514.

[18] R. Benigni, R. Zito, The second national toxicology program comparative exercise on the prediction of rodent carcinogenicity: definitive results, Mutat. Res. 566 (2004) 49-63.

[19] D. Zakarya, E.M. Larfaoui, A. Boulaamail, M. Tollabi, T. Lakhlifi, QSARs for a series of inhibitory anilids, Chemosphere 36 (13) (1998) 2809-2818.

[20] M. Elhallaoui, M. Elasri, F. Ouazzani, A. Mechaqrane, T. Lakhlifi, Quantitative structure-activity relationships of noncompetitive antagonists of the NMDA receptor: a study of a series of MK801 derivative molecules using statistical methods and neural network, Int. J. Mol. Sci. 4 (2003) 249-262.

^ article in press

R. Hmamouchi etal. / Journal ofTaibah University for Science xxx (2015) xxx-xxx

[21] G. Jing, Z. Zhou, J. Zhuo, Quantitative structure-activity relationship (QSAR) study of toxicity of quaternary ammonium compounds on Chlorella pyrenoidosa and Scenedesmus quadri-cauda, Chemosphere 86 (2012) 76-82.

[22] H. González-Díaz, D.M. Herrera-Ibatá, A. Duardo-Sánchez, C.R. Munteanu, R.A. Orbegozo-Medina, A. Pazos, ANN multiscale model of anti-HIV drugs activity vs AIDS prevalence in the US at county level based on information indices of molecular graphs and social networks, J. Chem. Inf. Model. 54 (3) (2014) 744-755.

[23] H. González-Díaz, S. Arrasate, N. Sotomayor, E. Lete, C.R. Munteanu, A. Pazos, L. Besada-Porto, J.M. Ruso, MIANN models in medicinal, physical and organic chemistry, Curr. Top. Med. Chem. 13 (5) (2013) 619-641.

[24] E. Tenorio-Borroto, C.G. Peñuelas Rivas, J.C. Vásquez Chagoyán, N. Castañedo, F.J. Prado-Prado, X. García-Mera, H. González-Díaz, ANN multiplexing model of drugs effect on macrophages; theoretical and flow cytometry study on the cyto-toxicity of the anti-microbial drug G1 in spleen, Bioorg. Med. Chem. 20 (20) (2012) 6181-6194.

[25] C. Adamo, V.Barone, Chem. Phys. Lett. 330 (2000) 152-160.

[26] M.J. Frisch, et al., Gaussian 03, Revision, B., 01, Gaussian, Inc., Pittsburgh, PA, 2003.

[27] A.D. Becke, J. Chem. Phys. 98 (1993) 1372.

[28] C. Lee, W. Yang, R.G. Parr, Phys. Rev. B 37 (1988) 785-789.

[29] R. Hmamouchi, A.I. Taghki, M. Larif, A. Adad, A. Abdellaoui, M. Bouachrine, T. Lakhlifi, J. Chem. Pharm. Res. 5 (9) (2013) 198-202.

[30] R. Hmamouchi, M. Larif, A. Adad, M. Bouachrine, T. Lakhlifi, Int. J. Adv. Res. Comput. Sci. Softw. Eng. 4 (2) (2014) 241-251.

[31] R. Hmamouchi, M. Larif, A. Adad, M. Bouachrine, T. Lakhlifi, J. Comput. Methods Mol. Des. 4 (3) (2014) 61-71.

[32] STATITCF Software, Technical Institute of Cereals and Fodder, Paris, France, 1987.

[33] D. Mantzaris, G. Anastassopoulos, Intelligent prediction of vesicoureteral reflux disease, WSEAS Trans. Syst. 4 (2005) 1440-1449.

[34] S. Baboo, I. Shereef, An efficient weather forecasting system using artificial neural network, Int. J. Environ. Sci. 1 (2010) 321-326.

[35] I. Manssouri, M. Manssouri, B. El Kihel, Fault detection by K-NN algorithm and MLP neuronal networks in distillation column, J. Inf. Intell. Knowl. vol. 3 (2011) 72-75.

[36] R. Nayak, L. Jain, B. Ting, Artificial neural networks in biomedical engineering: a review, Proc. 1st Asian-Pacific Congr. Comput. Mech. (2001) 887-892.

[37] H. Demuth, M. Hugan, M. Beal, Neural Network Toolbox. For Use with MATHLAB, User's Guide. Version 9, 2011.

[38] K. Hornik Approximation capabilities of multilayer feedforward networks, Neural Netw. 4 (2) (1991) 251-257.

[39] C.W. Dawson, R.L. Wilby, A comparison of artificial neural networks used for rainfall runoff modelling, Hydrol. Earth Syst. Sci. 3 (2000) 529-540.