Scholarly article on topic 'New biological insights from better structure models'

New biological insights from better structure models Academic research paper on "Biological sciences"

Share paper
Academic journal
Journal of Molecular Biology
OECD Field of science
{"structure validation" / rerefinement / rebuilding}

Abstract of research paper on Biological sciences, author of scientific article — Wouter G. Touw, Robbie P. Joosten, Gert Vriend

Abstract Structure validation is a key component of all steps in the structure determination process, from structure building, refinement, deposition, and evaluation all the way to post-deposition optimisation of structures in the Protein Data Bank (PDB) by re-refinement and re-building. Today, many aspects of protein structures are understood better than 10years ago, and combined with improved software and more computing power, the automated PDB_REDO procedure can significantly improve about 85% of all X-ray structures ever deposited in the PDB. We review structure validation, structure improvement, and a series of validation resources and facilities that give access to improved PDB files and to reports on the quality of the original and the improved structures. Post-deposition optimisation generally leads to improved protein structures and a series of examples will illustrate how that, in turn, leads to improved or even novel biological insights.

Academic research paper on topic "New biological insights from better structure models"

Accepted Manuscript

New biological insights from better structure models Wouter G. Touw, Robbie P. Joosten, Gert Vriend

PII: S0022-2836(16)00089-9

DOI: doi: 10.1016/j.jmb.2016.02.002

Reference: YJMBI64988

To appear in:

Received date: Revised date: Accepted date:

Journal of Molecular Biology

16 June 2015 4 January 2016 1 February 2016

Please cite this article as: Touw, W.G., Joosten, R.P. & Vriend, G., New biological insights from better structure models, Journal of Molecular Biology (2016), doi: 10.1016/j.jmb.2016.02.002

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

New biological insights from better structure models

Wouter G. Touwa, Robbie P. Joostenb, Gert Vrienda*

a. Centre for Molecular and Biomolecular Informatics, Radboud University Medical Center, Geert Grooteplein-Zuid 26-28, 6525 GA Nijmegen, The Netherlands

b. Department of Biochemistry, Netherlands Cancer Institute, Plesmanlaan 121 1066 CX Amsterdam, The Netherlands

* Corresponding author. E-mail:; Phone: +31 24 361 9521.

Research highlights

• Validation is a key component of the structure determination process.

• Automated post-deposition optimization using PDB_REDO improves X-ray structure models.

• Better structure models often generate improved or novel biological insights.


Structure validation is a key component of all steps in the structure determination process, from structure building, refinement, deposition, and evaluation, all the way to post-deposition optimization of structures in the Protein Data Bank (PDB) by rerefinement and rebuilding.

Today many aspects of protein structures are understood better than ten years ago, and combined with improved software and more computing power, the automated PDB_REDO procedure can significantly improve about 85% of all X-ray structures ever deposited in the PDB.

We review structure validation, structure improvement, and a series of validation resources and facilities that give access to improved PDB-files and to reports on the quality of the original and the improved structures.

Post-deposition optimization generally leads to improved protein structures and a series of examples will illustrate how that in turn leads to improved, or even novel biological insights.


Structure validation; rerefinement; rebuilding


A-CAT: a-kinase domain of myosin heavy chain kinase; ATX: autotaxin;

CSD: Cambridge Structural Database; DACA: Directional Atomic Contact Analysis; FF: force field;

hPNMT: human phenylethanolamine N-methyltransferase;

LPA: lysophosphatidic acid;

MD: molecular dynamics;

PBD: Polo Box Domain;

PDB: Protein Data Bank;

ProSA: Protein Structure Analysis;

RMS: root-mean-square;

ROS: reactive oxygen species;

VTF: Validation Task Force.


In 1951 Pauling and Corey predicted the a-helix [1] and the p-sheet [2]. In 1958 the first picture of a protein was obtained when Kendrew and colleagues solved the structure of myoglobin at 6 Á using X-ray crystallography [3]. In 1960 the structure of myoglobin was obtained at 2 Á [4] and the structure of haemoglobin was solved [5]. The similarity between the tertiary structures of haemoglobin and myoglobin showed the evolutionary conservation of the globin folds [6,7], and these structures "laid the foundation" [8] for understanding the mechanism of cooperativity in haemoglobin, and hinted already at the possibility to perform homology modelling, which four years later was done for the first time when a-lactalbumin was modelled based on the crystal structure of hen egg-white lysozyme [9]. Myoglobin and haemoglobin illustrate the impact of protein atom coordinates on science in general and on biology in particular. Kendrew and Perutz received for their work a Nobel Prize, an honour later also bestowed on scientists [] for (structure) work on GPCRs, the ribosome, insulin, the photosynthetic reaction centre, ATPase, GFP, ubiquitin, ion and water channels, protein structure NMR in general, and for computational techniques on protein structures. The last of this list was awarded for "the development of multiscale models for complex chemical systems", which is a computational technique that critically depends on the accuracy of the protein structure coordinates.

Novotny, Bruccoleri, and Karplus were the first to ask whether correctly folded protein models could be distinguished from incorrectly folded protein models [10,11]. They modelled the sequence of the a-helical sea worm hemerythrin on the mainly p-stranded mouse immunoglobulin VL domain and vice versa. The incorrect side chains could be incorporated reasonably well and the empirical potential energy of these misfolded models was comparable to the correct models. The misfolded models however, had higher non-covalent energy terms, a larger solvent accessible surface area and more exposed non-polar side chain atoms [10]. Others reported that compared to the correct models the incorrect models have a lower solvation free energy [12], are less compact [13], and make only about half as many hydrophobic contacts [14]. These deliberately misfolded proteins have long served to validate protein structure analysis and validation methods [12-24].


Figure 1: Threading errors in protein structures. a) The best superposition possible for ferredoxin I in the correctly traced Protein Data Bank (PDB) entry 5fd1 [25] (orange) and the misthreaded [26] PDB entry 2fd1 [27] (grey). b) 6-strands 61 (yellow) and 63 (orange) in the correct PDB entry 5p21 [28]. These two 6-strands were traced in each other's density [29] in the structure of human p21 reported in [30]. c) The small subunits of spinach RuBisCo are coloured in the correct structure model 1rcx [31]. The small RuBisCo subunit was threaded incorrectly [32] in the structure reported in [33]. d) Residues 143-203 of the enolase structures 1enl [34] (grey) and 2enl [35] (orange). The first 6-strand is followed by a loop and an anti-parallel second 6-strand in the correct structure model 2enl. The 1enl structure model is traced backwards. e) In the Ca-only PDB entry 2hvp [36] five Ca atoms (ball-and-sticks) are incorrectly assigned to the C-terminus of the HIV-1 protease rather than correctly to the N-terminus (spheres), resulting in an erroneous dimer interface [37]. The problem can be resolved (PDB entry 3hvp [37]) by breaking the C-terminal connection (scissors) with the stretch of five Ca atoms and connecting (glue) the stretch to the N-terminus instead. The symmetry-related copy is shown in purple. Figures were prepared with CCP4mg [38].

It is generally believed that the field of protein structure validation came into existence in 1989 when serious errors were discovered in a series of deposited crystal

structures (see Figure 1). These discoveries led to the CCP4 study weekend "Accuracy and reliability of macromolecular crystal structures" and Branden and Jones subsequently published their seminal commentary on errors and checks to detect them [39]. Soon after this commentary was published, the crystallographic modelbuilding program O could compare rotamers and the position of backbone oxygen atoms to database penta-peptides [40]. The protein structure bioinformatics community also took up the challenge and in 1993 the first three structure validation methods were published in rapid succession: Directional Atomic Contact Analysis (DACA; [21]), PROCHECK [41], and Protein Structure Analysis (ProSA; [23]) were published. These three methods determined rules from protein structures solved at high resolution -that are therefore presumed 'correct'- to find errors in protein structures in general.

Vriend and Sander determined a contact quality index that measures the agreement between the atom distributions of all possible close contacts in the structure model and equivalent database distributions [21]. This detailed evaluation of atomic packing also allows for the detection of local errors in the protein packing. This method was implemented in WHAT IF [42]. The quality index resulting from this analysis is now known as packing quality or DACA in WHAT_CHECK [43].

Thornton and co-workers described the stereochemical quality of protein structures in terms of the parameters derived by Morris et al. (Ca chirality, disulphide bond length, proline main-chain hydrogen-bond energy, peptide bond planarity, side-chain torsion angles x1 and x2 etc.) [44], bond lengths and bond angles [45], and position in the Ramachandran plot [46]. These stereochemical checks were implemented in PROCHECK [41].

Sippl applied the concept of potentials of mean force [47] to Ca-Ca [23] and Cp-Cp [16,23] distances. In the ProSA method the pseudo energy of proteins is derived using a combination of mean force potentials, and the mean field energy of a protein structure is transformed into a Z-score by evaluating the energy for a large number of structure decoys (alternative conformations) as well [23].

More methods have been published that give a score to a whole molecule, often, like ProSA, in threading projects. Eisenberg and colleagues, for example, calculated amino-acid preferences as a function of three residue environment parameters (the area of the residue that is buried; the fraction of side-chain area that is covered by polar atoms; and the local secondary structure) [48] and measured the compatibility of a protein model with its sequence using this so-called 3D profile [19].

Figure 2: Protein structures with improbable features. Left: In 2006 Gros and colleagues [49] identified several unusual and improbable features in a structure of the complement protein C3b (PDB entry 2hr0). The absence of crystal contacts in the c-direction of the unit cell (grey) was the most improbable feature. This triggered an investigation by the University of Alabama as to whether structures solved by H.K.M. Murthy were fabricated [50]. Right: another instance of unusual features was discovered by Rupp in 2012 [51]. The figure shows residues Val134 and Lys135 (red: atoms modelled at zero occupancy; white: full occupancy) of PDB entry 3k78 [52] and the 2mFo-DFc map (calculated using a grid size of 0.1 Á) contoured at +1.0a (green) and at the noise level +0.4a (blue). One of the highly improbably model features noticed by Rupp was the complete absence of any 2mFo-DFc density for unoccupied atoms down to near-noise levels while normal main-chain B-factors had been reported. This suggested that the data was indeed calculated from a model with zero occupancy atoms [51].

Initially, protein structure validation was met with some resistance from the protein structure determination field (e.g. [53]), but some high profile cases of structure models with very unusual features (see Figure 2) led the protein structure communities -structure determination and bioinformatics alike- to start Validation Task Forces (VTFs) for X-ray crystallography [54], NMR [55], and electron microscopy [56]. The X-ray and NMR VTFs have written their recommendations, and the wwPDB consortium [57] is presently implementing these recommendations in software that depositors of structures must use. It will take time to implement all VTF recommendations, so depositors who want to very extensively validate their structure before deposition will for a while still need to use tools like WHAT_CHECK [], MolProbity [58] and CING [59] [] in addition to the PDB [57,60] validation server [54,61].

Validation tools can be categorized in many ways, for example by their level of detail. Most of the older tools give one score for the whole structure and capture the overall quality of a structure in one number. Although ProSA, PROCHECK, DACA, and

QMEAN [62,63], for example, score aspects of individual residues, their strength lies in whole protein quality evaluation. Validation tools also can be categorized by the certainty with which they can call things right or wrong. Most validation options do not determine the quality but merely the normality of a protein structure, i.e. how much this one protein looks like a collection of good, high resolution structures in terms of the validation parameters. Other validation options (noticeably all nomenclature checks, many administrative validation options, and routines that calculate the agreement of a model with the experimental data) provide answers about quality rather than normality. The Cambridge Structural Database (CSD) [64] holds more than 700,000 structures of small molecules that have been solved at much higher resolution then most PDB entries. Geometric parameters that can be determined from an analysis of CSD files are therefore so accurate that they can for all practical purposes be used as a gold standard when solving or validating PDB entries. The prime example of CSD-derived parameters are the famous Engh and Huber bond length and bond angle data [45,65] that today still are used in most refinement and validation software. Similarly, Hooft et al. used the CSD to determine the normal deviation from planarity in planar groups in proteins [66].

Global scores can detect bad structure models but they are not very useful when validation is used to actually improve model quality. Many tools, fortunately, detect erroneous molecular details that can be used directly to improve the quality of structures.

Hooft et al. [67] used the CSD to arrive at a force field (FF) for hydrogen bond energies and used this FF to optimise the flipping of Asn, Gln, and His side chains. Hydrogen bond network optimisation is part of WHAT_CHECK [43] and MolProbity [68]. Nielsen showed the importance of this validation-based structure improvement for electrostatic calculations [69,70], and the realisation that a series of measured pKa values commonly used for the calibration of electrostatic computation methods were flawed by crystal packing artefacts dramatically improved the entire field of protein electrostatics [70]. Nielsen also showed that electrostatic calculations for most enzymes in the PDB would give significantly better results if the hydrogen bonding network would be improved prior to the calculations [69].

Wrong cell dimensions lead to systematic deviations in bond lengths and angles. Hooft et al. wrote software that projects the protein's bond lengths and angles on the axis system of the crystal cell to correct the cell's dimensions [71]. Lamzin and co-workers later improved this method [72].

The growth of the PDB has allowed validation of the Ramachandran plot [58,73] and bond lengths and angles [74,75] to become specific for secondary

structure and residue type. Rotamer libraries constructed using high-resolution protein structures [40,76-79] are used in several programs to perform a knowledge-based validation of side-chain conformation. Misfit side chains may also be detected by Cp position deviations [80] and steric clashes [43,58]. The RosettaHoles software provides a validation score for underpacking [81].

Several groups have developed tools such as VHELIBS [82], ValLigURL [83], Mogul [84], and Twilight [85] to visualize and validate ligands, tools such as pdb-care [86], CARP [87], and privateer validate [88,89] to check carbohydrates [90,91], and programs like ERRASER [92] to rebuild and MolProbity [58] to validate nucleic acids.

CH4, NH3, NH4+, H2O, OH-, Ne, Na+, Mg2+ and Al3+ all contain 10 electrons, and thus will scatter X-rays roughly equally much. This makes it hard to see the difference between them in any electron density determined at worse than atomic resolution. The same problem exists for K+ and Ca2+ that both have 18 electrons. Moreover, K+ and Ca2+ at half occupancy scatter X-rays roughly equally much as H2O, Na+, Mg2+, etc. Consequently, many ions in the PDB are of the wrong type or actually should be water, while many waters should be ions [93]. Brown has determined empirical bond valence parameters [94,95] that can be used to determine from the distances between the ion and its coordinating atoms what ion type it should be. This method works reasonably well, but only at high resolution, when all surrounding atoms can be seen very well in the density, and when there is no bias caused by refining ion X as ion Y forcing ion X to get the ligand-atom distances of ion Y. The Brown parameters have been implemented in SHELX [96] and WHAT_CHECK (unpublished), and later also in CheckMyMetal [97], and Phenix [98].

Alkali and alkaline earth metals are preferentially coordinated by oxygen atoms and not by nitrogen atoms. Dauter et al. recently reported that calcium ions in the so-called strong calcium site of several savinase structure models seem to be coordinated by the nitrogen atom instead of the oxygen atom of an asparagine [99]. A pseudo-octahedral calcium site is normally coordinated by oxygen atoms only. The B-factors of the Asn suggested that the side chain should be flipped. An inspection of the PDBREPORT database [43] reveals that the chemically highly implausible coordination of sodium, potassium, calcium, or magnesium ions by the nitrogen atom instead of the oxygen atom in asparagine or glutamine side chains occurs in 327 sites in 269 PDB entries, including the savinase example and several structures solved at atomic resolution (Figure 3).

Figure 3: Asn side-chain flip at atomic resolution. Sodium binding site of

transformylase (PDB entry 1kjq, determined at 1.05 A resolution [100]). The side chain of Asn100 has a highly unlikely conformation because the nitrogen rather than the oxygen coordinates the ion. PDB_REDO flips the side chain so that the ion is coordinated by the side chain oxygen. The PDB files of several hundred incorrect sites like this site contain LINK records specifying the N-metal coordination. Incorrect LINK records between the nitrogen of Asn and Gln side chains and Na+, Mg2+, K+ and



Ca2+ are removed allowing structure correction by side-chain rebuilding (since PDB_REDO version 5.37). Future versions of PDB_REDO will also flip metal coordinating side chains based on results from WHAT_CHECK.

Several authors have noted an underrepresentation of cis peptides in the PDB that is partly the result of the a priori assumption in structure determination that all peptides have a trans conformation [101-104]. Trans peptide planes have also been observed rotated by 180°; this is called a peptide plane flip. Peptide plane flips typically are the result of mistakes in the early stages of model building when the electron density maps are not yet very clear. An incorrectly built peptide plane tends to lead to locally distorted geometry. We recently designed a Random Forest based method to detect these problems [105] and found almost 5000 trans-cis errors and many thousands of peptide plane flips.

Improving PDB files by rerefinement and rebuilding

Over the years 3299 PDB entries have been made obsolete. Most times these entries were made obsolete because a better version, for example based on higher resolution data, became available, but sometimes the entries were highly improbable and made obsolete without putting a replacement file in the PDB. We also see more and more cases of PDB files that were improved and deposited by others than the original authors. In 2007 Joosten & Vriend took a more systematic approach and re-refined some 1200 structure models for which data was available to 2.00 A resolution

[106]. More than three quarters of the re-refined models had an improved R-free value and improved geometric characteristics. After this successful small-scale proof of concept, Joosten et al. re-refined all high and medium resolution X-ray structures in the PDB (15,000 at the time) for which the reflection data (including the R-free set) were deposited and useful [107]. They showed that, despite the complication of using many more low resolution models, two thirds of the re-refined models were improved in terms of R-free [107]. The Ramachandran Z-score of the structure models also improved over the entire resolution range. In addition, they showed that the possibility of improving published structure models was not limited to old structure models, but that more than 60% of recently deposited structure models could also be improved. The addition of side-chain rebuilding and peptide-flipping tools [108] plus more advanced refinement parameterisation algorithms improved the success rate of PDB_REDO and extended the scope to active correction of modelling errors or, more poetically, 'constructive validation' [109]. The specific PDB_REDO steps that are applied based on validation algorithms are described in Table 1. Analyses of R-free and six WHAT_CHECK model-quality metrics of 12,000 randomly chosen PDB entries show that 85% of the PDB entries can be improved in terms of overall quality [109].

Table 1: Validation-driven PDB_REDO steps.

PDB_REDO step Programs involved

Removal of improbable (metal coordination) LINKs Stripper [109]

Correction of carbohydrate LINK topology Stripper

Correction of carbohydrate names pdb-care [86] and stripper

Removal of superfluous carbohydrate oxygens pdb-care and stripper

Removal of improbable ligand occupancy models REFMAC [110]

Removal of overly detailed B-factor models REFMAC and Bselect [109]

Correction of atomic chirality problems REFMAC, WHAT_CHECK [43], and Chiron [109]

Addition of missing side-chain atoms SideAide [108] and DSSP [111,112]

Histidine, asparagine, and glutamine flips to improve hydrogen bonding WHAT_CHECK and SideAide

Peptide flipping Pepflip [108] and DSSP

Removal of waters not supported by the electron density Centrifuge [108]

The re-refined structure models (plus electron density maps and a multitude of metadata) are stored in the PDB_REDO databank [112,113]. This repository now holds 99% of all crystallographic PDB entries for which experimental data is deposited (currently more than 90,000). New entries are added automatically with every new PDB release. Older PDB_REDO entries are replaced gradually or whenever a PDB entry is re-released, typically because of changes in the entry's annotation. It should be noted that many changes in annotation of PDB entries are the result of the PDB_REDO project. Over the course of the project nearly 7500 annotation problems that somehow hampered the optimization or interpretation of PDB entries were reported and the PDB staff corrected the majority of these.

In an automated procedure there is always a risk of introducing errors. A particularly difficult step is the restraint generation for ligands. This relies on reasonable input coordinates of a ligand, correct annotation of the chemistry by the PDB and/or proper interpretation of the coordinates by the tools in PDB_REDO. Although ligands generally improve slightly in PDB_REDO [82], sometimes ligands are refined incorrectly. It is therefore highly recommended to critically inspect ligands and their electron density manually, which is, by the way, not different for original PDB entries [83,85,114,115].

Taken together, the PDB_REDO procedure typically leads to structure models that better fit their experimental data, have more plausible molecular geometry, and are more informative for biological interpretation.

Better biology through better structure models

Four decades after Browne's first attempt at homology modelling on a-lactalbumin, the technique has become a research field in itself, and much effort is directed towards selecting good models from a large set of candidates (e.g. [116] and references therein). Using both PDB templates and PDB_REDO templates, we built homology models with YASARA for 33 CASP11 targets for which the alignment is essentially certain while small structural details are important (best GDT_TS > 60%) [116]. We found that the average Ca root-mean-square (RMS) error was reduced from 2.28 A (using PDB templates) to 2.15 A (using PDB_REDO templates). The average Ca RMS deviation between PDB and PDB_REDO structure models is 0.15 A.

These results suggest that the use of PDB_REDO templates certainly does not harm the homology modelling process and that the changes made by PDB_REDO are improvements in the right direction.

Better template structures thus lead to better homology models, and both better structure models and better homology models obviously must lead to better answers to biological questions. Sometimes corrections in PDB files do not influence the biology; a bond length correction by 0.18 Angstrom, for example, is crystallographically significant but will not change the answer to a question related to mutability, antigen selection, or intermolecular interactions. Other corrections though, for example replacing a calcium ion near the active site by a zinc or flipping the side chain of asparagine in the ligand binding pocket, are likely to lead to radically different and more reliable answers to questions related to understanding an enzyme's mechanism or drug design.

The next ten sections review examples of 'better biology through better structures'. In each example improvement of the PDB file led to a different view on the biological role of a molecule or to a different answer to a biological question.

Example 1 - Peptide plane flip in Plkl Polo Box Domain substrate

Garcia-Alvarez et al. reported the structure of the Polo Box Domain (PBD) of the human serine/threonine kinase Plk1 in complex with a 9-mer phospopeptide substrate derived from Cdc25C (PDB entry: 2ojs; [117]). Plk1 is essential for regulating cell cycle progression and is an important drug target for cancer therapy [118]. They discuss the molecular mechanisms of substrate recognition of Plk1 and the implications for the centrosomal localization and activity.

The PDB_REDO program Pepflip detected that the peptide plane between Leu1 and Leu2 of the Cdc25C phosphopeptide should be flipped to better fit the electron density and improve the Ramachandran plot. In the correct conformation there is an additional hydrogen bond between the peptide and Asp416 in Plk1 (Figure 4). Furthermore, after the peptide plane flip the Cdc25C peptide forms an additional p-strand, thereby extending the p-sheet in Plk1. The substrate conformation was corrected by the depositors who obsoleted PDB entry 2ojs and superseded it by PDB entry 3bzi. Free energy calculations using the corrected phosphopeptide showed that the phosphothreonine residue and the mainchain atoms of the peptide account for the majority of the binding enthalpy [119].

Figure 4: Peptide-plane flip in Plk1 Polo Box Domain substrate. Left: the electron density around the peptide plane between Leul and Leu2 of the substrate (white carbons) suggests the peptide should be flipped to allow a hydrogen bond to the carbonyl of Asp416 in the enzyme (pink carbons) in PDB entry 2ojs [117]. Right: the peptide substrate extends the Plkl 6-sheet in the superseding entry 3bzi due to the flipped peptide plane. Unless mentioned otherwise, the 2mFo-DFc and mFo-DFc maps have been sampled with a grid size equal to a third of the resolution and are shown at a contour level of +1.2a (blue) and +3a (green) and -3a (red), respectively, and the entire model has been used for the calculated structure factors. The 2mFo-DFc and mFo-DFc maps are shown up to 2 A from the displayed peptide atoms.

The extension of the p-sheet is an integral part of the substrate recognition mechanism and is only visible in the corrected structure.

Example 2 - Wishfully modelling a Plkl Polo Box Domain inhibitor

The Plk1 PBD binds to pThr/pSer-containing motifs [120]. Qian et al. reported a Plk1 PBD structure in complex with an inhibitor in the PBD pocket (PDB entry 4mlu; [121]). Qian et al. designed the inhibitory peptide to mimic a natural substrate but wanted to improve the cellular uptake efficiency by making the inhibitory peptide mono-anionic rather than di-anionic by masking the phoshothreonine. The mono-anionic phosphoester was fitted in the reported structure model 4mlu.

Dauter et al. [99] discovered that the electron density does not justify modelling the phosphoester moiety (Figure 5). Thus, the inhibitor is still di-anionic. Qian et al. then retracted their paper and replaced the phosphoester moiety by water in PDB structure 4o6w that supersedes 4mlu (Figure 5).

Figure 5: The phosphothreonine fragment of a Polo Box Domain inhibitor in the structure of the human Plk1 is di-anionic rather than mono-anionic. Left: although a disordered phosphoester moiety is modelled in PDB entry 4mlu [121], the electron density suggests that this group is absent. Right: water molecules and a di-anionic inhibitor are modelled in the superseding PDB entry 4o6w [121]. The 2mFo-DFc and mFo-DFc maps are shown up to 1.5 A from the displayed inhibitor fragment and water molecules.

In summary, the design of PBD inhibitors that both mimic the natural substrate and have drug-like physicochemical properties is still an open challenge.

Example 3 - hPHMT ligand identification

Human phenylethanolamine N-methyltransferase (hPNMT) catalyzes the conversion of R-noradrenaline to R-adrenaline. In this reaction a methyl group is transferred from the cofactor S-adenosyl-L-methionine to noradrenaline. Central nervous system-specific PNMT inhibitors potentially are important drug targets for Alzheimer's and Parkinson's disease. In a fragment-based drug design screen, Drinkwater et al. [122] soaked hPNMT crystals with 96 mixtures of four chemically diverse small molecules and modelled 12 hits in the electron density, 9 of which were confirmed by Isothermal Titration Calorimetry (ITC) to bind to hPNMT.

Nair et al. [123] showed that molecular dynamics (MD) simulations reproduced the crystal structure binding mode modelled for these 9 compounds. For one of the other cocktails Drinkwater et al. proposed 6-chlorooxindole as the most likely candidate for explaining the electron density observed in the noradrenaline pocket (PDB entry 3kpy, Figure 6), but they could not confirm binding by ITC. The MD simulations predicted that 6-chlorooxindole cannot stably bind to hPNMT. In contrast, the simulations suggested that the pocket was occupied by two other fragments in the cocktail, benzene-1,3-diol and imidazole. Free energy calculations predicted the binding to be cooperative and rerefinement showed that these two fragments together could also account for the electron density (PDB entry 4dm3, Figure 6).

Figure 6: Ligand identification in the hPNMT active site. The binding pocket is occupied by 6-chlorooxindole in PDB entry 3kpy [122] (left) and with benzene-1,3-diol (in two alternative conformations) and imidazole in PDB entry 4dm3 [123] (right). The binding modes of these two ligands were predicted by MD simulations [123]. The figure shows the possible hydrogen bonds over time. The 2mFo-DFc map is shown up to 1.5 A from the ligands.

The combined pharmacophores of benzene-1,3-diol and imidazole provide a better basis for rational design and thus for the development of hPNMT inhibitors.

Example 4 - Herceptin-HER2 interface

When overexpressed, the growth factor receptor-like HER2 protein (also known as ErbB2 and Neu) can promote malignant cell transformation [124]. The monoclonal antibody Trastuzumab, commercially known as Herceptin, is known to have an antiproliferative effect on cells transformed by overexpression of HER2 and is therefore used to treat HER2-positive metastatic breast cancers [125]. The structure of the Fab fragment of Herceptin bound to the extracellular domain of HER2, PDB entry 1n8z [126], shows the binding interface of the two proteins. This indicates where Herceptin binds, but as a result of poor side chain fitting, the structure model does not properly show how and why Herceptin binds (Figure 7).

Automated rebuilding of 1n8z reveals numerous additional receptor-antibody interactions, resulting in a much more faithful description of the binding mode of Herceptin.

Figure 7: Improving the binding interface between Herceptin and human growth factor receptorlike HER2 protein. Left: detail of PDB entry 1n8z showing the Fab light chain of Herceptin (pink) with a single hydrogen bond to HER2 [126]. Right: the PDB_REDO optimized version of 1n8z. Flipping Asn30 and refitting Thr31 together with small adjustments to the local HER2 side chains reveals a hydrogen bonding network between the proteins containing four hydrogen bonds and one hydrogen bond that correctly positions the Thr31 and Asn30 side chains. The 2mFo-DFc map is shown up to 1.5 A from the protein fragments.

The better understanding of the binding mode of Herceptin contributes to the development of other monoclonal antibodies in cancer immunotherapy.

Example 5 - Ion identity in myosin heavy chain kinase regulatory sites

Myosin II plays a central role in cytokinesis, cell migration, and adhesion [127]. The a-kinase domain of myosin heavy chain kinase (A-CAT) is involved in regulating the formation of myosin II filaments and the active site of A-CAT undergoes a conformational switch that is said to be influenced by the magnesium-binding sites [128].

Minor and co-workers recently implemented Brown's bond valence method in the CheckMyMetal webserver [97] for the validation of metals in macromolecular structures. They reported several examples of mis-identified ions, among which the magnesium ions in A-CAT (PDB entry 3lkm, [128]). The validation results, the reported crystallization conditions, the sample preparation, and manual rerefinement all suggest that one magnesium ion should be replaced with water, while the other two should be replaced by potassium and coordinated also by ethylene glycol (Figure 8) [97].

Figure 8: Water and potassium rather than magnesium in the a-kinase domain of myosin heavy chain kinase. Left: Mg901 should be replaced by a water molecule in PDB entry 3lkm [128] according to the metal validation software [97]. The contact distance is shown in Angstroms. Right: the site occupied by Mg902 in 3lkm should be occupied by a potassium ion instead [97]. The two water molecules (bottom right) should be replaced by ethylene glycol [97]. The 2mFo-DFc maps are shown at a contour level of +1.5a.

The presence of potassium rather than magnesium in the regulatory sites casts serious doubt on the role of magnesium and suggests that the role of potassium in regulating the activity of a-kinase is worth investigating.

Example 6 - Trans-cis isomerization in Rab4a switch 2 region

The Ras-like protein Rab4 is involved in endosomal sorting by orchestrating a small GTPase cascade for recruitment of adaptor proteins to early endosomes [129]. Despite the high sequence similarity between members of the Rab family each member targets specific effector proteins. One of the molecular regions involved in the discrimination between different effector proteins is the so-called switch 2 region [130]. The switch 2 region is rearranged upon GTP hydrolysis. The structure of human Rab4a has been solved in the active state with the GTP analog GppNHp bound (PDB entry 2bme; [130]) and in the inactive GDP-bound state (PDB entry 2bmd; [130]).

Residue Phe72 is located at the start of a-helix H2 in the switch 2 region of Rab4a and has the trans conformation in the GppNHp-bound state. The trans conformation is also present in the GDP-bound structure. Recently, a method was created to detect c/'s-peptides erroneously modelled as trans-peptides [105]. The method predicted that Phe72 in the GDP-bound state should have been modelled as a c/'s-peptide rather than a trans-peptide and this prediction was validated by rerefinement [105] (see Figure 9).

Figure 9: Rab4a trans-cis flip in the switch 2 region of Rab4a. Left: the peptide between Arg71 (side chain not shown for clarity) and Phe72 in PDB entry 2bmd [130] has the trans conformation but deviating local geometry and the electron density around the peptide bond suggest that the peptide should have the cis conformation. Right: the cis peptide fits the experimental data much better. The 2mFo-DFc maps are shown at a contour level of +1.5a.

Although it cannot be excluded that the cis conformation was induced by crystallization, these findings strongly suggested that Arg71 - Phe72 trans-cis isomerisation plays a role in the discrimination between different effector proteins that hitherto was unknown.

Example 7 - Peroxiredoxin active site in MD simulations

The human pathogen Mycobacterium tuberculosis is responsible for millions of deaths every year [131]. The bacterium gets engulfed by host macrophages, exposing it to a toxic environment of reactive oxygen species (ROS), but it can survive these hostile conditions by expressing peroxidases [132] such as the one-cysteine peroxiredoxin AhpE [133]. When AhpE scavenges ROS, Cys45 is sulfenylated. The sulfenic acid form of Cys45 can be reduced by mycothiol or mycoredoxin-1 [134].

Palló et al. carried out MD simulations to study the active site in atomic detail

[personal communication, 2015. Palló A, van Bergen L, Alonso M, Nilsson L, de Proft F & Messens J. The revisited AhpE structures affect the molecular dynamics simulations of the Mycobacterium tuberculosis one-cysteine peroxiredoxin.]. MD simulations are sensitive to errors in macromolecular structures. Palló et al. observed that simulations were not stable when PDB entry 1xxu [133] was used as a starting structure. The a-helix that contains Cys45 started to unwind during a 30 ns simulation. In contrast, simulations using the PDB_REDO structure were stable, probably because of the optimized hydrogen-bond network in the active site (Figure 10).

Figure 10: Active site of 1-Cys peroxiredoxin AhpE with Cys45 in the reduced state. Left: Cys45 is located in helix a2 that unwinds during MD simulations based on PDB entry 1xxu [133]. Right: in the PDB_REDO structure the flipped Gln46 side chain optimizes the local hydrogen bonding network with Asp50, Trp80, and Ser84, and increases the stability of MD simulations.

The improved AhpE structure model allows for mechanistic studies of the M. tuberculosis peroxiredoxin at atomic detail.

Example 8 - Malaria drugs

Plasmodium falciparum is the parasite that causes Malaria which still ranks as one of the diseases with the highest death toll. The parasitic aspartic acid protease plasmepsin II is involved in degradation of the host cell haemoglobin [135] and is therefore an interesting drug target. The structure of plasmepsin II was thought to be determined in complex with two inhibitors rs367 and rs370 in PDB entries 1lee and 1lf2, respectively [136]. The difference between the two inhibitors is the position of the amino group which is meta in the benzamide in rs367 and para in rs370. The structures in 1lf2 and 1lee are nearly identical with an all-atom RMS deviation of just 0.32 A.

Inspection of the electron density around the inhibitor in 1lee suggests that the amino group should be modelled as a para-substituent (see Figure 11) which meant that both 1lee and 1lf2 contained the same inhibitor, likely as the result of a mix-up during the structure determination. The detection of this mix-up is currently beyond the capabilities of validation routines and instead relies on critical inspection of the electron density by the crystallographer which should be a key step in determining structures with ligands [115].

computing power was used as needed. Low-throughput rational drug design projects aimed at better inhibitors will suffer even more from this para-meta error.

Figure 11: The benzamide moiety of the plasmepsin II inhibitor in PDB entry 1lee

[136]. The electron density suggests that the amino group of the benzamide moiety should be modelled para instead of meta. The 2mFo-DFc and mFo-DFc maps are shown up to 1.5 A and 2.5 A from the displayed inhibitor fragment, respectively. The 2mFo-DFc map is shown at +2a.

Docking studies on both structures [137] led to new candidate inhibitors, but it is a pity that twice as much time and

Example 9 - The chemistry of autotaxin inhibitors

Autotaxin (ATX, also known as ENPP2) is a secreted enzyme that converts lysophosphatidylcholine into the lipid signalling molecule lysophosphatidic acid (LPA). The ATX-LPA signalling axis is involved in normal physiology and pathophysiology [138]. Autotaxin expression is found to be upregulated in several carcinomas and is implicated in motility of tumour cells [138] and as such a target for developing drugs for cancer treatment. One class of ATX inhibitors is based on a boronic acid moiety that binds covalently to the hydroxyl group of active site residue Thr209 [139]. In this process, the hybridisation of the boron atom changes from sp2 to sp3, analogous to the formation of tetrahydroxyborate from boric acid.

The structure of ATX with inhibitor 3BoA (PDB entry 3wax [140]) shows the problem of dealing with changing chemistry in refinement. Although the authors correctly report that 3BoA is covalently bound to Thr209, this is not reflected in the structure model because the distance between the boron atom and the Thr-Oy1 is 2.28 A and the boron atom is sp2 hybridized in the model (Figure 12). The structure of ATX and 4BoA from the same study (PDB entry 3way) suffers from the same problem. In the PDB_REDO 3wax structure model the B-Oy1 distance is 1.38 A and the boron atom is properly sp3 hybridized. The earlier published structure of ATX and the boronic acid inhibitor HA155 also shows the correct geometry (PDB entry 2xrg, [139]), but the other structure models may lead to misinterpretation of the ligand binding interaction.

Figure 12: Correcting chemical representation of boron. Left: fragment of the inhibitor 3BoA (white) bound to the active site of autotaxin (PDB entry 3wax [140], pink). The boron atom (grey) is modelled as sp2 hybridized and covalently bound to the benzene moiety (orange dashed lines) but not covalently bound to Thr209. The blue Glu576 is from a different ATX molecule related by crystal symmetry. Right: PDB_REDO-optimized version of 3wax with the correct boron hybridization and a covalent bond to Thr209. Manually generated restraints are required to deal with the complex chemistry of the inhibitor-autotaxin interaction.

Correct chemical representation is crucial for the design or optimisation of ATX inhibitors.

Example 10 - Xylose isomerase active site

Figure 13 shows the active site pocket of xylose isomerase, an enzyme that catalyzes the interconversion between D-xylose and D-xylulose and between D-glucose and D-fructose [141]. The reaction involves hydrogen transfer and two magnesium ions.

PDB entry 3xia [142] was superseded by 1xya [141] after, for example, packing quality analysis [21] and inspection of the electron density revealed that many amino acids were misidentified or misthreaded. 1xya is in much better agreement with the biochemistry. Neutron diffraction later also showed that a water molecule rather than an hydroxyl ion should have been modelled in the active site [143].

Figure 13: Threading errors around the active site pocket of xylose isomerase. The 1-

letter amino acid codes are shown on top of the Ca-trace. Orange: 3xia [142] is misthreaded at many locations. Cyan: 1xya [141]. Magnesium and an hydroxyl ion are shown as yellow and red spheres, respectively. Figure prepared with YASARA [144].

The comparison of 3xia and 1xya clearly shows that a misthreaded structure model can place the wrong amino acids at the wrong positions. Correct answers to biological questions related to the function of the protein can only be obtained from correctly threaded structures models. We are reasonably certain that the work of the X-ray VTF will lead to the situation that structures like 3xia will not be passed on to the life science community. Fortunately, the deposition of reflection data is mandatory now. The problems described here might be identified more readily when experimental data are available.

Validation-related facilities

Many facilities exist to validate protein structures. Several have been mentioned in this article already. Table 2 lists a series of protein structure validation facilities that are freely accessible on the internet [112]. Of course, the validation modules mentioned here can be used not only for checking X-ray structure models but also for checking structure models derived by NMR, EM, or in silico modelling. NMR- and EM-specific validation methods and structure-factor based methods are beyond the scope of this article.

Table 2: Macromolecular structure validation facilities.

Facility Description

PDB_REDO [109,145] Constructive validation by rerefinement and partial rebuilding

WHAT_CHECK [43,112] Extensive macromolecular validation

MolProbity [58,68] Macromolecular validation

PROCHECK [41] Protein structure geometry checks

PDB validation server [54,61] Pre- and post-validation of PDB entries

QMEAN [62,146] Global model quality estimation

ProSA [23,147] Knowledge-based potentials of mean force to evaluate macromolecular structure model accuracy

CheckMyMetal [97] Validation of metal-binding sites

VHELIBS [82] Validation of ligands and binding sites

ValLigURL [83] Ligand validation

Twilight [85] Ligand visualisation and validation

PSVS [148] Metaserver that includes many of the above

Much of the validation work has found its way already into the PDB_REDO project. PDB_REDO entries are freely available from []. Crystallographers can freely use the PDB_REDO server at [] to optimize their work-in-progress structure models.

Concluding remarks

Today, experimental data deposition is an obligatory aspect of structure deposition in the PDB. Indeed, data are missing for only one recent crystal structure entry deposited in the PDB in 2014 (4ux6 [149]). We congratulate those who have set a great example by depositing datasets that were missing for thirty years (e.g. for the structural studies of leghemoglobin by Steigemann and co-workers [150]).

The highly improbable features in Schwarzenbacher's structure model of the birch pollen allergen Bet v 1 protein [52] were detected [51] by studying anomalies in the statistics of PDB_REDO's model optimisation. Extremely unusual features, such as those shown in Figure 2a, might remain undetected for longer if the corresponding reflection data are not made available.

Deposition of reflection data was not mandatory until recently (1 February 2008). We believe it is beneficial if missing reflection data sets are recovered and deposited because we believe that transparency by depositors and validation by others will lead to a higher quality archive. Not necessarily because we expect any cases of fraud or gross error in PDB entries that do not have deposited reflection data, but because the validation of alternative structure models against reflection data allows post-deposition model improvements. The oldest structure models in the PDB can be improved using today's methods. The average MolProbity clash score of the oldest models, for example, was at the 48th percentile relative to MolProbity's reference set and ended up at the 80th percentile after rerefinement and rebuilding [108]. As better computational crystallographic techniques continue to be developed the quality of the archive can be improved ever further.

As with any scientific endeavor, validation of models and data is a component of sound application of the scientific method. Macromolecular structures solved by X-ray crystallography are the result of experiments - performed by humans, and may thus contain experimental and human errors.

Although there were a small number of individuals resistant to validation for many years [53], the majority were in favour. Research into geometric, thermodynamic, electrostatic, and many other aspects of protein structures has continued, and the series of recent highly visible cases of very unlikely structures that had to be retracted have further anchored validation tools in the protein structure solution pipelines.

Our experiences with PDB_REDO lead us to conclude that more emphasis should be placed on the deposition of raw data (diffraction images and/or unmerged reflections) and experiment-related metadata. A computer readable description of

the crystallization conditions is an important example. This will allow the development of more and better tools aimed at making the best possible structure models. We also suggest that referees of articles mentioning novel PDB entries should receive a structure validation report, without having to ask for it, rather than be made to assess a structure model's quality from the very limited information in a manuscript and its supplemental data. To this end we reiterate the importance for crystallographers to finish and deposit their structure models and the reflection data before submitting a manuscript, rather than just before it is accepted [151]. Current PDB deposition procedures make this possible and have the option to "suppress entry titles at the time of submission to the PDB until the structure is released" [152]. In the long term, a validation report can be accompanied by a report from PDB_REDO or another automated model optimization procedure that shows whether the model can be improved beyond the effort delivered by the depositor. This will certainly further improve the average quality of PDB structure models. These structure models, like all scientific results, are predestined to be reused by others. Therefore, a higher average model quality will also improve the quality of many projects directed at answering important biomedical questions.


The authors thank Elmar Krieger for homology modelling advice, Rob Hooft for critically reading this manuscript, Alan Mark for pointing out to us his work on hPHMT, Anna Palló for showing us her AhpE results, and all structural biologists who have deposited experimental data. Jon Black and Coos Baakman provided technical support.

G.V. acknowledges financial support from NewProt that is funded by the European Commission within its FP7 Programme, under the thematic area KBBE-2011-5 with contract number 289350, and from the research programme 11319, which is financed by STW. R.P.J. is supported by Vidi 723.013.003 from Netherlands Organization for Scientific Research (NWO).


[1] Pauling L, Corey RB. Atomic coordinates and structure factors for two helical configurations of polypeptide chains. Proc Natl Acad Sci U S A 1951;37:235-40. doi:10.1073/pnas.37.5.235.

[2] Pauling L, Corey RB. The pleated sheet, a new layer configuration of polypeptide chains. Proc Natl Acad Sci U S A 1951;37:251-6. doi:10.1073/pnas.37.5.251.

[3] Kendrew JC, Bodo G, Dintzis HM, Parrish RG, Wyckoff H, Phillips DC. A three-dimensional model of the myoglobin molecule obtained by x-ray analysis. Nature 1958;181:662-6. doi:10.1038/181662a0.

[4] Kendrew JC, Dickerson RE, Strandberg BE, Hart RG, Davies DR, Phillips DC, et al. Structure of myoglobin: A three-dimensional Fourier synthesis at 2 A. resolution. Nature 1960;185:422-7. doi:10.1038/185422a0.

[5] Perutz MF, Rossmann MG, Cullis AF, Muirhead H, Will G, North ACT. Structure of Hemoglobin: A Three-Dimensional Fourier Synthesis at 5.5-0. Resolution, Obtained by X-Ray Analysis. Nature 1960;185:416-22. doi:10.1038/185416a0.

[6] Perutz MF. Structure and function of haemoglobin: I. A tentative atomic model of horse oxyhaemoglobin. J Mol Biol 1965;13:646-IN2.

[7] Perutz MF, Kendrew JC, Watson HC. Structure and function of haemoglobin: II. Some relations between polypeptide chain configuration and amino acid sequence. J Mol Biol 1965;13:669-78.

[8] Rossmann MG. The beginnings of structural biology. Recollections, special section in honor of Max Perutz. Protein Sci 1994;3:1731-3. doi:10.1002/pro.5560031012.

[9] Browne WJ, North a. CT, Phillips DC, Brew K, Vanaman TC, Hill RL. A possible three-dimensional structure of bovine a-lactalbumin based on that of hen's egg-white lysozyme. J Mol Biol 1969;42:65-86. doi:10.1016/0022-2836(69)90487-2.

[10] Novotny J, Bruccoleri R, Karplus M. An analysis of incorrectly folded protein models. Implications for structure predictions. J Mol Biol 1984;177:787-818. doi:10.1016/0022-2836(84)90049-4.

[11] Novotny J, Rashin AA, Bruccoleri RE. Criteria that discriminate between native proteins and incorrectly folded models. Proteins 1988;4:19-30.

[12] Eisenberg D, McLachlan AD. Solvation energy in protein folding and binding. Nature 1986;319:199-203. doi:10.1038/319199a0.

[13] Zehfus MH, Rose GD. Compact units in proteins. Biochemistry 1986;25:5759-65.

[14] Bryant SH, Amzel LM. Correctly folded proteins make twice as many hydrophobic contacts. Int J Pept Protein Res 1987;29:46-52. doi:10.1111/j.1399-3011.1987.tb02228.x.

[15] Baumann G, Frömmel C, Sander C. Polarity as a criterion in protein design. Protein Eng 1989;2:329-34. doi:10.1093/protein/2.5.329.

[16] Hendlich M, Lackner P, Weitckus S, Floeckner H, Froschauer R, Gottsbacher K, et al. Identification of native protein folds amongst a large number of incorrect

models. The calculation of low energy conformations from potentials of mean force. J Mol Biol 1990;216:167-80. doi:10.1016/S0022-2836(05)80068-3.

[17] Toma K. Number of residues in a sphere around a certain residue can be used as a hydrophobic penalty function of proteins. J Mol Graph 1991;9:78-84. doi:10.1016/0263-7855(91)85002-G.

[18] Holm L, Sander C. Evaluation of protein models by atomic solvation preference. J Mol Biol 1992;225:93-105. doi:10.1016/0022-2836(92)91028-N.

[19] Luthy R, Bowie JU, Eisenberg D. Assessment of protein models with three-dimensional profiles. Nature 1992;356:83-5. doi:10.1038/356083a0.

[20] Bryant SH, Lawrence CE. An empirical energy function for threading protein sequence through the folding motif. Proteins 1993;16:92-112. doi:10.1002/prot.340160110.

[21] Vriend G, Sander C. Quality control of protein models: directional atomic contact analysis. J Appl Crystallogr 1993;26:47-60. doi:10.1107/S0021889892008240.

[22] Colovos C, Yeates TO. Verification of protein structures: patterns of nonbonded atomic interactions. Protein Sci 1993;2:1511-9. doi:10.1002/pro.5560020916.

[23] Sippl MJ. Recognition of errors in three-dimensional structures of proteins. Proteins Struct Funct Bioinforma 1993;17:355-62. doi:10.1002/prot.340170404.

[24] Delarue M, Koehl P. Atomic environment energies in proteins defined from statistics of accessible and contact surface areas. J Mol Biol 1995;249:675-90. doi:10.1006/jmbi.1995.0328.

[25] Stout CD. Crystal structures of oxidized and reduced Azotobacter vinelandii ferredoxin at pH 8 and 6. J Biol Chem 1993;268:25920-7.

[26] Stout GH, Turley S, Sieker LC, Jensen LH. Structure of ferredoxin I from Azotobacter vinelandii. Proc Natl Acad Sci 1988;85:1020-2.

[27] Ghosh D, O'Donnell S, Furey W, Robbins AH, Stout CD. Iron-sulfur clusters and protein structure of Azotobacter ferredoxin at 2.0 A resolution. J Mol Biol 1982;158:73-109. doi:10.1016/0022-2836(82)90451-X.

[28] Pai EF, Krengel U, Petsko GA, Goody RS, Kabsch W, Wittinghofer A. Refined crystal structure of the triphosphate conformation of H-ras p21 at 1.35 A resolution: implications for the mechanism of GTP hydrolysis. EMBO J 1990;9:2351-9.

[29] Pai EF, Kabsch W, Krengel U, Holmes KC, John J, Wittinghofer A. Structure of the guanine-nucleotide-binding domain of the Ha-ras oncogene product p21 in the triphosphate conformation. Nature 1989;341:209-14.


[30] de Vos AM, Tong L, Milburn M V, Matias PM, Jancarik J, Noguchi S, et al. Three-dimensional structure of an oncogene protein: catalytic domain of human c-H-ras p21. Science 1988;239:888-93. doi:10.1126/science.2448879.

[31] Taylor TC, Andersson I. The structure of the complex between rubisco and its natural substrate ribulose 1,5-bisphosphate. J Mol Biol 1997;265:432-44. doi:10.1006/jmbi.1996.0738.

[32] KNIGHT S, ANDERSSON I, BRANDEN C-I. Reexamination of the Three-Dimensional Structure of the Small Subunit of RuBisCo from Higher Plants. Science (80- ) 1989;244:702-5. doi:10.1126/science.244.4905.702.

[33] Chapman MS, Suh SW, Curmi PM, Cascio D, Smith WW, Eisenberg DS. Tertiary structure of plant RuBisCO: domains and their contacts. Science 1988;241:71-4. doi:10.1126/science.3133767.

[34] Lebioda L, Stec B. Crystal structure of enolase indicates that enolase and pyruvate kinase evolved from a common ancestor. Nature 1988;333:683-6. doi:10.1038/333683a0.

[35] Lebioda L, Stec B, Brewer JM. The structure of yeast enolase at 2.25-A resolution. An 8-fold beta + alpha-barrel with a novel beta beta alpha alpha (beta alpha)6 topology. J Biol Chem 1989;264:3685-93.

[36] Navia MA, Fitzgerald PM, McKeever BM, Leu CT, Heimbach JC, Herber WK, et al. Three-dimensional structure of aspartyl protease from human immunodeficiency virus HIV-1. Nature 1989;337:615-20. doi:10.1038/337615a0.

[37] Wlodawer A, Miller M, Jaskolski M, Sathyanarayana BK, Baldwin E, Weber IT, et al. Conserved folding in retroviral proteases: crystal structure of a synthetic HIV-1 protease. Science 1989;245:616-21. doi:10.1126/science.2548279.

[38] McNicholas S, Potterton E, Wilson KS, Noble MEM. Presenting your structures: the CCP4mg molecular-graphics software. Acta Crystallogr D Biol Crystallogr 2011;67:386-94. doi:10.1107/S0907444911007281.

[39] Bränden C-I, Jones TA. Between objectivity and subjectivity. Nature 1990;343:687-9. doi:10.1038/343687a0.

[40] Jones TA, Zou JY, Cowan SW, Kjeldgaard M. Improved methods for building protein models in electron density maps and the location of errors in these models. Acta Crystallogr Sect A Found Crystallogr 1991;47:110-9. doi:10.1107/S0108767390010224.

[41] Laskowski RA, MacArthur MW, Moss DS, Thornton JM. PROCHECK: a program to check the stereochemical quality of protein structures. J Appl Crystallogr

1993;26:283-91. doi:10.1107/S0021889892009944.

[42] Vriend G. WHAT IF: A molecular modeling and drug design program. J Mol Graph 1990;8:52-6. doi:10.1016/0263-7855(90)80070-V.

[43] Hooft RWW, Vriend G, Sander C, Abola EE. Errors in protein structures. Nature 1996;381:272. doi:10.1038/381272a0.

[44] Morris AL, MacArthur MW, Hutchinson EG, Thornton JM. Stereochemical quality of protein structure coordinates. Proteins Struct Funct Genet 1992;12:345-64. doi:10.1002/prot.340120407.

[45] Engh RA, Huber R. Accurate bond and angle parameters for X-ray protein structure refinement. Acta Crystallogr Sect A Found Crystallogr 1991;47:392-400. doi:10.1107/S0108767391001071.

[46] Ramachandran GN, Ramakrishnan C, Sasisekharan V. Stereochemistry of polypeptide chain configurations. J Mol Biol 1963;7:95-9. doi:10.1016/S0022-2836(63)80023-6.

[47] Sippl MJ. Calculation of conformational ensembles from potentials of mean force. An approach to the knowledge-based prediction of local structures in globular proteins. J Mol Biol 1990;213:859-83. doi:10.1016/S0022-2836(05)80269-4.

[48] Bowie JU, Lüthy R, Eisenberg D. A method to identify protein sequences that fold into a known three-dimensional structure. Science (80- ) 1991;253:164-70. doi:10.1126/science.1853201.

[49] Janssen BJC, Read RJ, Brünger AT, Gros P. Crystallography: crystallographic evidence for deviating C3b structure. Nature 2007;448:E1-2; discussion E2-3. doi:10.1038/nature06103.

[50] Borrell B. Fraud rocks protein community. Nature 2009;462:970. doi:10.1038/462970a.

[51] Rupp B. Detection and analysis of unusual features in the structural model and structure-factor data of a birch pollen allergen. Acta Crystallogr Sect F Struct Biol Cryst Commun 2012;68:366-76. doi:10.1107/S1744309112008421.

[52] Zaborsky N, Brunner M, Wallner M, Himly M, Karl T, Schwarzenbacher R, et al. Antigen aggregation decides the fate of the allergic immune response. J Immunol 2010;184:725-35. doi:10.4049/jimmunol.0902080.

[53] Petsko GA. Large cast, but no plot. Nature 1992;359:596-7. doi:10.1038/359596a0.

[54] Read RJ, Adams PD, Arendall WB, Brunger AT, Emsley P, Joosten RP, et al. A new generation of crystallographic validation tools for the protein data bank. Structure 2011;19:1395-412. doi:10.1016/j.str.2011.08.006.

[55] Montelione GT, Nilges M, Bax A, Güntert P, Herrmann T, Richardson JS, et al. Recommendations of the wwPDB NMR Validation Task Force. Structure 2013;21:1563-70. doi:10.1016/j.str.2013.07.021.

[56] Henderson R, Sali A, Baker ML, Carragher B, Devkota B, Downing KH, et al. Outcome of the first electron microscopy validation task force meeting. Structure 2012;20:205-14. doi:10.1016/j.str.2011.12.014.

[57] Berman H, Henrick K, Nakamura H, Markley JL. The worldwide Protein Data Bank (wwPDB): ensuring a single, uniform archive of PDB data. Nucleic Acids Res 2007;35:D301-3. doi:10.1093/nar/gkl971.

[58] Chen VB, Arendall WB, Headd JJ, Keedy D a., Immormino RM, Kapral GJ, et al. MolProbity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr Sect D Biol Crystallogr 2010;66:12-21. doi:10.1107/S0907444909042073.

[59] Doreleijers JF, Sousa da Silva AW, Krieger E, Nabuurs SB, Spronk C a EM, Stevens TJ, et al. CING: an integrated residue-based structure validation program suite. J Biomol NMR 2012;54:267-83. doi:10.1007/s10858-012-9669-7.

[60] Gutmanas A, Alhroub Y, Battle GM, Berrisford JM, Bochet E, Conroy MJ, et al. PDBe: Protein Data Bank in Europe. Nucleic Acids Res 2014;42:D285-91. doi:10.1093/nar/gkt1180.

[61] Gore S, Velankar S, Kleywegt GJ. Implementing an X-ray validation pipeline for the Protein Data Bank. Acta Crystallogr Sect D Biol Crystallogr 2012;68:478-83. doi:10.1107/S0907444911050359.

[62] Benkert P, Tosatto SCE, Schomburg D. QMEAN: A comprehensive scoring function for model quality assessment. Proteins 2008;71:261-77. doi:10.1002/prot.21715.

[63] Haas J, Roth S, Arnold K, Kiefer F, Schmidt T, Bordoli L, et al. The protein model portal - A comprehensive resource for protein structure and model information. Database 2013;2013. doi:10.1093/database/bat031.

[64] Groom CR, Allen FH. The Cambridge Structural Database in retrospect and prospect. Angew Chemie - Int Ed 2014;53:662-71. doi:10.1002/anie.201306438.

[65] Engh RA, Huber R. Structure quality and target parameters. In: Rossmann MG, Arnold E, editors. vol. F. 1st ed., Chester, England: International Union of Crystallography; 2001, p. 382-92.

[66] Hooft RWW, Sander C, Vriend G. Verification of Protein Structures: Side-Chain Planarity. J Appl Crystallogr 1996;29:714-6. doi:10.1107/S0021889896008631.

[67] Hooft RWW, Sander C, Vriend G. Positioning hydrogen atoms by optimizing hydrogen-bond networks in protein structures. Proteins Struct Funct Genet 1996;26:363-76. doi:10.1002/(SICI)1097-0134(199612)26:4<363::AID-PR0T1>3.0.C0;2-D.

[68] Davis IW, Leaver-Fay A, Chen VB, Block JN, Kapral GJ, Wang X, et al. MolProbity: all-atom contacts and structure validation for proteins and nucleic acids. Nucleic Acids Res 2007;35:W375-83. doi:10.1093/nar/gkm216.

[69] Nielsen JE, Andersen K V, Honig B, Hooft RW, Klebe G, Vriend G, et al. Improving macromolecular electrostatics calculations. Protein Eng 1999;12:657-62. doi:10.1093/protein/12.8.657.

[70] Nielsen JE, Vriend G. Optimizing the hydrogen-bond network in poisson-boltzmann equation-based pKa calculations. Proteins Struct Funct Genet 2001;43:403-12. doi:10.1002/prot.1053.

[71] Vriend G, Hooft R. Some WHAT_CHECK Checks Explained. Protein Data Bank Q Newsl 1998;84:4-5.

[72] EU 3-D Validation Network. Who checks the checkers? four validation tools applied to eight atomic resolution structures. J Mol Biol 1998;276:417-36. doi:10.1006/jmbi.1997.1526.

[73] Hooft RWW, Sander C, Vriend G. Objectively judging the quality of a protein structure from a Ramachandran plot. Comput Appl Biosci 1997;13:425-30. doi:10.1093/bioinformatics/13.4.425.

[74] Berkholz DS, Shapovalov M V., Dunbrack Jr. RL, Karplus PA. Conformation Dependence of Backbone Geometry in Proteins. Structure 2009;17:1316-25. doi:10.1016/j.str.2009.08.012.

[75] Touw WG, Vriend G. On the complexity of Engh and Huber refinement restraints: The angle t as example. Acta Crystallogr Sect D Biol Crystallogr 2010;66:1341-50. doi:10.1107/S0907444910040928.

[76] De Filippis V, Sander C, Vriend G. Predicting local structural changes that result from point mutations. Protein Eng 1994;7:1203-8.

[77] Dunbrack RL, Cohen FE. Bayesian statistical analysis of protein side-chain rotamer preferences. Protein Sci 1997;6:1661-81. doi:10.1002/pro.5560060807.

[78] Lovell SC, Word JM, Richardson JS, Richardson DC. The penultimate rotamer library. Proteins 2000;40:389-408.

[79] Berntsen KRM, Vriend G. Anomalies in the refinement of isoleucine. Acta Crystallogr Sect D Biol Crystallogr 2014;70:1037-49. doi:10.1107/S139900471400087X.

[80] Lovell SC, Davis IW, Arendall WB, De Bakker PIW, Word JM, Prisant MG, et al. Structure validation by Ca geometry: and Cß deviation. Proteins Struct Funct Genet 2003;50:437-50. doi:10.1002/prot.10286.

[81] Sheffler W, Baker D. RosettaHoles: Rapid assessment of protein core packing for structure prediction, refinement, design, and validation. Protein Sci 2009;18:229-39. doi:10.1002/pro.8.

[82] Cereto-Massagué A, Ojeda MJ, Joosten RP, Valls C, Mulero M, Salvado MJ, et al. The good, the bad and the dubious: VHELIBS, a validation helper for ligands and binding sites. J Cheminform 2013;5:36. doi:10.1186/1758-2946-5-36.

[83] Kleywegt GJ, Harris MR. ValLigURL: A server for ligand-structure comparison and validation. Acta Crystallogr Sect D Biol Crystallogr 2007;63:935-8. doi:10.1107/S090744490703315X.

[84] Bruno IJ, Cole JC, Kessler M, Luo J, Momerwell WDS, Purkis LH, et al. Retrieval of crystallographically-derived molecular geometry information. J Chem Inf Comput Sci 2004;44:2133-44. doi:10.1021/ci049780b.

[85] Weichenberger CX, Pozharski E, Rupp B. Visualizing ligand molecules in twilight electron density. Acta Crystallogr Sect F Struct Biol Cryst Commun 2013;69:1-6. doi:10.1107/S1744309112044387.

[86] Lütteke T, von der Lieth C-W. pdb-care (PDB carbohydrate residue check): a program to support annotation of complex carbohydrate structures in PDB files. BMC Bioinformatics 2004;5:69. doi:10.1186/1471-2105-5-69.

[87] Lütteke T, Frank M, von der Lieth CW. Carbohydrate Structure Suite (CSS): Analysis of carbohydrate 3D structures derived from the PDB. Nucleic Acids Res 2005;33. doi:10.1093/nar/gki013.

[88] Agirre J, Cowtan K. Carbohydrate validation in CCP4 6.5. Comput Crystallogr Newsl 2015;6:7-9.

[89] Agirre J, Davies G, Wilson K, Cowtan K. Carbohydrate anomalies in the PDB. Nat Chem Biol 2015;11:303-303. doi:10.1038/nchembio.1798.

[90] Lütteke T. Analysis and validation of carbohydrate three-dimensional structures. Acta Crystallogr Sect D Biol Crystallogr 2009;65:156-68. doi:10.1107/S0907444909001905.

[91] Emsley P, Brunger AT, Lütteke T. Tools to Assist Determination and Validation of Carbohydrate 3D Structure Data, 2015, p. 229-40. doi:10.1007/978-1-4939-2343-4_17.

[92] Chou F-C, Sripakdeevong P, Dibrov SM, Hermann T, Das R. Correcting pervasive errors in RNA crystallography through enumerative structure prediction. Nat Methods 2013;10:74-6. doi:10.1038/nmeth.2262.

[93] Nayal M, Di Cera E. Valence screening of water in protein crystals reveals potential Na+ binding sites. J Mol Biol 1996;256:228-34. doi:10.1006/jmbi.1996.0081.

[94] Brown ID. Predicting bond lengths in inorganic crystals. Acta Crystallogr Sect B Struct Crystallogr Cryst Chem 1977;33:1305-10. doi:10.1107/S0567740877005998.

[95] Brown ID. Chemical and steric constraints in inorganic solids. Acta Crystallogr Sect B Struct Sci 1992;48:553-72. doi:10.1107/S0108768192002453.

[96] Müller P, Köpke S, Sheldrick GM. Is the bond-valence method able to identify metal atoms in protein structures? Acta Crystallogr Sect D Biol Crystallogr 2002;59:32-7. doi:10.1107/S0907444902018000.

[97] Zheng H, Chordia MD, Cooper DR, Chruszcz M, Müller P, Sheldrick GM, et al. Validation of metal-binding sites in macromolecular structures with the CheckMyMetal web server. Nat Protoc 2014;9:156-70. doi:10.1038/nprot.2013.172.

[98] Echols N, Morshed N, Afonine P V, McCoy AJ, Miller MD, Read RJ, et al. Automated identification of elemental ions in macromolecular crystal structures. Acta Crystallogr Sect D Biol Crystallogr 2014;70:1104-14. doi:10.1107/S1399004714001308.

[99] Dauter Z, Wlodawer A, Minor W, Jaskolski M, Rupp B. Avoidable errors in deposited macromolecular structures: an impediment to efficient data mining. IUCrJ 2014;1:1-15. doi:10.1107/S2052252514005442.

[100] Thoden JB, Firestine SM, Benkovic SJ, Holden HM. PurT-encoded glycinamide ribonucleotide transformylase. Accommodation of adenosine nucleotide analogs within the active site. J Biol Chem 2002;277:23898-908. doi:10.1074/jbc.M202251200.

[101] Huber R, Steigemann W. Two cis-prolines in the Bence-Jones protein Rei and the cis-pro-bend. FEBS Lett 1974;48:2-4.

[102] Stewart DE, Sarkar a, Wampler JE. Occurrence and role of cis peptide bonds in protein structures. J Mol Biol 1990;214:253-60. doi:10.1016/0022-2836(90)90159-J.

[103] Weiss MS, Jabs A, Hilgenfeld R. Peptide bonds revisited. Nat Struct Biol 1998;5:676.

[104] Jabs A, Weiss MS, Hilgenfeld R. A method to detect nonproline cis peptide bonds in proteins. J Mol Biol 1999;286:291-304.

[105] Touw WG, Joosten RP, Vriend G. Detection of trans-cis flips and peptide-plane flips in protein structures. Acta Crystallogr Sect D Biol Crystallogr 2015;71.


[106] Joosten RP, Vriend G. PDB improvement starts with data deposition. Science 2007;317:195-6. doi:10.1126/science.317.5835.195.

[107] Joosten RP, Salzemann J, Bloch V, Stockinger H, Berglund A-C, Blanchet C, et al. PDB_REDO: automated re-refinement of X-ray structure models in the PDB. J Appl Crystallogr 2009;42:376-84. doi:10.1107/S0021889809008784.

[108] Joosten RP, Joosten K, Cohen SX, Vriend G, Perrakis A. Automatic rebuilding and optimization of crystallographic structures in the Protein Data Bank. Bioinformatics 2011;27:3392-8. doi:10.1093/bioinformatics/btr590.

[109] Joosten RP, Joosten K, Murshudov GN, Perrakis A. PDB_REDO0: constructive validation, more than just looking for errors. Acta Crystallogr Sect D Biol Crystallogr 2012;68:484-96. doi:10.1107/S0907444911054515.

[110] Murshudov GN, Skubak P, Lebedev AA, Pannu NS, Steiner RA, Nicholls RA, et al. REFMAC5 for the refinement of macromolecular crystal structures. Acta Crystallogr D Biol Crystallogr 2011;67:355-67. doi:10.1107/S0907444911001314.

[111] Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 1983;22:2577-637. doi:10.1002/bip.360221211.

[112] Touw WG, Baakman C, Black J, Beek TAH, Krieger E, Joosten P, et al. A series of PDB-related databanks for everyday needs. Nucleic Acids Res 2015;43:D364-8. doi:10.1093/nar/gku1028.

[113] Joosten RP, te Beek TAH, Krieger E, Hekkelman ML, Hooft RWW, Schneider R, et al. A series of PDB related databases for everyday needs. Nucleic Acids Res 2011;39:D411-9. doi:10.1093/nar/gkq1105.

[114] Kleywegt GJ. Crystallographic refinement of ligand complexes. Acta Crystallogr. Sect. D Biol. Crystallogr., vol. 63, 2006, p. 94-100. doi:10.1107/S0907444906022657.

[115] Pozharski E, Weichenberger CX, Rupp B. Techniques, tools and best practices for ligand electron-density analysis and results from their application to deposited crystal structures. Acta Crystallogr Sect D Biol Crystallogr 2013;69:150-67. doi:10.1107/S0907444912044423.

[116] Krieger E, Joo K, Lee J, Lee J, Raman S, Thompson J, et al. Improving physical realism, stereochemistry, and side-chain accuracy in homology modeling: Four approaches that performed well in CASP8. Proteins 2009;77 Suppl 9:114-22. doi:10.1002/prot.22570.

[117] Garcia-Alvarez B, de Carcer G, Ibanez S, Bragado-Nilsson E, Montoya G.

Molecular and structural basis of polo-like kinase 1 substrate recognition: Implications in centrosomal localization. Proc Natl Acad Sci U S A 2007;104:3107-12. doi:10.1073/pnas.0609131104.

[118] Strebhardt K, Ullrich A. Targeting polo-like kinase 1 for cancer therapy. Nat Rev Cancer 2006;6:321-30. doi:10.1038/nrc1841.

[119] Huggins DJ, McKenzie GJ, Robinson DD, Narvaez AJ, Hardwick B, RobertsThomson M, et al. Computational Analysis of Phosphopeptide Binding to the Polo-Box Domain of the Mitotic Kinase PLK1 Using Molecular Dynamics Simulation. PLoS Comput Biol 2010;6:e1000880. doi:10.1371/journal.pcbi.1000880.

[120] Elia AEH, Rellos P, Haire LF, Chao JW, Ivins FJ, Hoepker K, et al. The molecular basis for phosphodependent substrate targeting and regulation of Plks by the Polo-box domain. Cell 2003;115:83-95. doi:10.1016/S0092-8674(03)00725-6.

[121] Qian WJ, Park JE, Lim D, Park SY, Lee KW, Yaffe MB, et al. Peptide-based inhibitors of plk1 polo-box domain containing mono-anionic phosphothreonine esters and their pivaloyloxymethyl prodrugs. Chem Biol 2013;20:1255-64. doi:10.1016/j.chembiol.2013.09.005.

[122] Drinkwater N, Vu H, Lovell KM, Criscione KR, Collins BM, Prisinzano TE, et al. Fragment-based screening by X-ray crystallography, MS and isothermal titration calorimetry to identify PNMT (phenylethanolamine N-methyltransferase) inhibitors. Biochem J 2010;431:51-61. doi:10.1042/BJ20100651.

[123] Nair PC, Malde AK, Drinkwater N, Mark AE. Missing fragments: Detecting cooperative binding in fragment-based drug design. ACS Med Chem Lett 2012;3:322-6. doi:10.1021/ml300015u.

[124] Di Fiore PP, Pierce JH, Kraus MH, Segatto O, King CR, Aaronson SA. erbB-2 is a potent oncogene when overexpressed in NIH/3T3 cells. Science 1987;237:178-82. doi:10.1126/science.2885917.

[125] Slamon DJ, Leyland-Jones B, Shak S, Fuchs H, Paton V, Bajamonde A, et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. vol. 344. 2001. doi:10.1056/NEJM200103153441101.

[126] Cho H-S, Mason K, Ramyar KX, Stanley AM, Gabelli SB, Denney DW, et al. Structure of the extracellular region of HER2 alone and in complex with the Herceptin Fab. Nature 2003;421:756-60. doi:10.1038/nature01392.

[127] Vicente-Manzanares M, Ma X, Adelstein RS, Horwitz AR. Non-muscle myosin II takes centre stage in cell adhesion and migration. Nat Rev Mol Cell Biol 2009;10:778-90. doi:10.1038/nrm2786.

[128] Ye Q, Crawley SW, Yang Y, Côté GP, Jia Z. Crystal structure of the alpha-kinase domain of Dictyostelium myosin heavy chain kinase A. Sci Signal 2010;3:ra17. doi:10.1126/scisignal.2000525.

[129] D'Souza RS, Semus R, Billings EA, Meyer CB, Conger K, Casanova JE. Rab4 orchestrates a small GTPase cascade for recruitment of adaptor proteins to early endosomes. Curr Biol 2014;24:1187-98. doi:10.1016/j.cub.2014.04.003.

[130] Huber SK, Scheidig AJ. High resolution crystal structures of human Rab4a in its active and inactive conformations. FEBS Lett 2005;579:2821-9. doi:10.1016/j.febslet.2005.04.020.

[131] WHO. Global tuberculosis report 2014 (WH0/HTM/TB/2014.08). 2014. doi:WH0/HTM/TB/2014.08.

[132] Manca C, Paul S, Barry CE, Freedman VH, Kaplan G. Mycobacterium tuberculosis catalase and peroxidase activities and resistance to oxidative killing in human monocytes in vitro. Infect Immun 1999;67:74-9.

[133] Li S, Peterson NA, Kim MY, Kim CY, Hung LW, Yu M, et al. Crystal structure of AhpE from Mycobacterium tuberculosis, a 1-Cys peroxiredoxin. J Mol Biol 2005;346:1035-46. doi:10.1016/j.jmb.2004.12.046.

[134] Van Laer K, Buts L, Foloppe N, Vertommen D, Van Belle K, Wahni K, et al. Mycoredoxin-1 is one of the missing links in the oxidative stress defence mechanism of Mycobacteria. Mol Microbiol 2012;86:787-804. doi:10.1111/mmi.12030.

[135] Le Bonniec S, Deregnaucourt C, Redeker V, Banerjee R, Grellier P, Goldberg DE, et al. Plasmepsin II, an acidic hemoglobinase from the Plasmodium falciparum food vacuole, is active at neutral pH on the host erythrocyte membrane skeleton. J Biol Chem 1999;274:14218-23. doi:10.1074/jbc.274.20.14218.

[136] Asojo OA, Afonina E, Gulnik S V., Yu B, Erickson JW, Randad R, et al. Structures of Ser205 mutant plasmepsin II from Plasmodium falciparum at 1.8 Â in complex with the inhibitors rs367 and rs370. Acta Crystallogr Sect D Biol Crystallogr 2002;58:2001-8. doi:10.1107/S0907444902014695.

[137] Kasam V, Zimmermann M, Maaß A, Schwichtenberg H, Wolf A, Jacq N, et al. Design of new plasmepsin inhibitors: A virtual high throughput screening approach on the EGEE grid. J Chem Inf Model 2007;47:1818-28. doi:10.1021/ci600451t.

[138] Moolenaar WH, Perrakis A. Insights into autotaxin: how to produce and present a lipid mediator. Nat Rev Mol Cell Biol 2011;12:674-9. doi:10.1038/nrm3188.

[139] Hausmann J, Kamtekar S, Christodoulou E, Day JE, Wu T, Fulkerson Z, et al. Structural basis of substrate discrimination and integrin binding by autotaxin.

Nat Struct Mol Biol 2011;18:198-204. doi:10.1038/nsmb.1980.

[140] Kawaguchi M, Okabe T, Okudaira S, Nishimasu H, Ishitani R, Kojima H, et al. Screening and X-ray crystal structure-based optimization of autotaxin (ENPP2) inhibitors, using a newly developed fluorescence probe. ACS Chem Biol 2013;8:1713-21. doi:10.1021/cb400150c.

[141] Lavie A, Allen KN, Petsko GA, Ringe D. X-ray crystallographic structures of D-xylose isomerase-substrate complexes position the substrate and provide evidence for metal movement during catalysis. Biochemistry 1994;33:5469-80. doi:10.1021/bi00184a016.

[142] Farber GK, Glasfeld A, Tiraby G, Ringe D, Petsko GA. Crystallographic studies of the mechanism of xylose isomerase. Biochemistry 1989;28:7289-97. doi:10.1021/bi00444a022.

[143] Katz AK, Li X, Carrell HL, Hanson BL, Langan P, Coates L, et al. Locating active-site hydrogen atoms in D-xylose isomerase: time-of-flight neutron diffraction. Proc Natl Acad Sci U S A 2006;103:8342-7. doi:10.1073/pnas.0602598103.

[144] Krieger E, Vriend G. YASARA View-molecular graphics for all devices-from smartphones to workstations. Bioinformatics 2014:1-2. doi:10.1093/bioinformatics/btu426.

[145] Joosten RP, Long F, Murshudov GN, Perrakis A. The PDB_REDO server for macromolecular structure model optimization. IUCrJ 2014;1:213-20. doi:10.1107/S2052252514009324.

[146] Benkert P, Künzli M, Schwede T. QMEAN server for protein model quality estimation. Nucleic Acids Res 2009;37. doi:10.1093/nar/gkp322.

[147] Wiederstein M, Sippl MJ. ProSA-web: Interactive web service for the recognition of errors in three-dimensional structures of proteins. Nucleic Acids Res 2007;35. doi:10.1093/nar/gkm290.

[148] Bhattacharya A, Tejero R, Montelione GT. Evaluating protein structures determined by structural genomics consortia. Proteins Struct Funct Genet 2007;66:778-95. doi:10.1002/prot.21165.

[149] Cheshire DR, Aberg A, Andersson GMK, Andrews G, Beaton HG, Birkinshaw TN, et al. The discovery of novel, potent and highly selective inhibitors of inducible nitric oxide synthase (iNOS). Bioorg Med Chem Lett 2011;21:2468-71. doi:10.1016/j.bmcl.2011.02.061.

[150] Arutynyan EG, Kuranova P, Vainshtein BK, Steigemann W. X-ray structural investigation of leghemoglobin. VI. Structure of acetate-ferrileghemoglobin at a resolution of 2.0 A. Kristallografiya 1980;25:80-103.

[151] Joosten RP, Soueidan H, Wessels LFA, Perrakis A. Timely deposition of

macromolecular structures is necessary for peer review. Acta Crystallogr Sect D Biol Crystallogr 2013;69:2293-5. doi:10.1107/S0907444913024621.

[152] Berman H, Kleywegt GJ, Nakamura H, Markley JL. Comment on timely deposition of macromolecular structures is necessary for peer review by Joosten et al. (2013). Acta Crystallogr Sect D Biol Crystallogr 2013;69:2296. doi:10.1107/S0907444913029168.