The program package consists of several modules which are addressed to a variety of problems in computational crystallography. Every module has an interactive graphical user interface and equipped with a detailed help menu.
This module serves to visualize molecules, crystals, powder diagrams, used potentials and the most strong contacts, attractive as well as repulsive, of a crystal structure.
This module is developed to compare crystal structures and/or X-ray powder diagrams among themselves. Thanks to the proposed original similarity index this module allows for a quick comparing and clustering of extended data sets including crystallographic data bases.
This module allows for energy scoring and minimization of organic, organometallic, and inorganic crystal structures. The score of crystal energy is based on self-consistent empirical potentials for all atom types. Parameters of potentials are derived by data mining of the CSD.
This module gives the possibility to predict crystal structure of rigid organic, organometallic and inorganic compounds in the most common space groups. As input information it needs the molecular geometry. It is possible to calculate also flexible molecules with already known or previously derived conformation. All atoms Data Mining Force Field allows to differentiate atom types according to their valence or atom coordination.
This module serves to determine background and to analyze the lines profile of experimental powder diagrams. It is necessary step before performing crystal structure determination with the module Powder
This module allows for crystal structure determination from indexed and un-indexed powder diagrams. It is implemented with a new similarity index for automated comparison of powder diagrams which is valid for refinement even in cases of large deviations in the cell constants and overlapping picks. As input information it needs the powder diagram (analyzed and cleaned previously with the module PowderFit), the molecular geometry, or a set of predicted crystal structures obtained with the module Prediction.
This new module allows to optimize parameters of All Atom Types Data Mining Force Field for specific atom types or rows of compounds of interest.
One might assume that the prediction of the physical and chemical properties of an unknown substance is possible, if we know it for millions of known substances.The Cambridge Structure Database contained at the beginning of last year over 900.000 crystal structures (2018). Our research center, CRS4, provides presently computational facilities of 190 Tflops (2018). The combination of both should allow to derive the Gibbs free energy of molecular interactions as function of temperature and pressure and to predict phase transitions, co-crystal formation, solubility, melting points, etc. For this purpose we did start in 1994 to develop FlexCryst. We will present the actual state of FlexCryst and its impact to pharmaceutical questions.
Already in 1971 the ideas of crystal engineering have been introduced. In the beginning the crystal structures have been discussed mainly in terms of atom pair distances and hydrogen bonds. Later Desiraju extended the considerations to more general schemes, which he named synthons. In the sense of machine learning descriptors have been introduced to describe crystal structures and by counting the synthons one obtains a frequency vector. However, the different descriptors didn't have a weight and their importance have been only estimated according statistics. To get more accurate values for the importance we did introduce data mining in crystallography. This allows to give exact weights for the different contributions. The presented force field is trained on 100.000 structures and contains force field parameters for all atoms (in contrast to the universal force field). The visualization of the pair interactions give us a better insight in the forces acting within crystals. During our work the most important application is the finding of faulty structures. Presently the obtained force field is accurate enough to recognize the small inaccuracies, which are intrinsic for crystal structures solved from powder diagrams. Further applications are the targeted substitution of groups or atoms to achieve a more dense packing, important for explosives, or to lower the interaction energy to achieve higher solubility.
While the crystal structure prediction of small organic molecules is possible, the prediction of large biomolecules (more as 20 degrees of freedom) and the prediction of organometallics is still an open question. For small organics the stereochemistry is in general known, however, this is for coordination complexes very often unknown. Therefore, a successful prediction of this complexes must include the prediction of the coordination of the metal. From the theoretical point of few we face the problem that not only inter-molecular interactions must be predicted correctly, but bonds as well. In our approach the required potentials are described as a Taylor-Series of (1/r^2). The inter- and intra-molecular interactions are merged in one effective potential as it was already elaborated for the reactive force field for water. For the parameterization of the potentials we used a multi step procedure of data mining on the experimental crystal structures of the Cambridge Structure Data Base. In a first step we analyzed the radial distribution function of the inter-atomic distances to eliminate unusual and susceptible crystal structures. The retained crystal structures have have been used to derive an approximation of the force field by using the gradients and solving the obtained equation system by singular value decomposition. The result is a good guess for a further refining of the potentials by classification. The parameterization by classification is based on the idea that observed atomic distances must be preferable to unobserved and the parameters are obtained by training the potential till the scoring function recognizes correctly, if a given crystal structure is invented or observed. We will demonstrate the method for the prediction of nickel-complexes as example. We can show that a the present state of development the data mining force field is able to predict the correct coordination among the likely candidates.
The most rich resources of data in chemistry are crystallographic data bases. The largest data base, the Cambridge Structure Database (CSD), has presently around 1.000.000 entrances. The worth of these data was immediately recognized and chemists did study and analyses the structures after their appearing. The related work was honored with 29 Nobel prizes. With the upcoming of artificial intelligence it was intriguing to apply it to crystallographic data bases. One of the most powerful tools in this context is classification. We can differentiate between supervised and unsupervised classification. The unsupervised classification can be used to detect and classify polymorphs (different crystal structures for the same molecule) or the identification of a substance via x-ray scattering diagram. The supervised classification allows to derive interaction potentials, which goes beyond standard methods of DFT since it gives direct access to the Gibbs free energy . One example is the isotopic effect in crystal structures. This can be reason for the formation of different polymorphs. The effect is commonly not accessible to quantum chemistry for the Born-Oppenheimer approximation. A second example, which we will show, is the temperature dependence of the crystal structures. For instance temperature dependent force field allows the correct prediction of the thermal expansion coefficients of crystals.
A new similarity index for automated comparison of powder diagrams is proposed. In contrast to traditionally used similarity indices, the proposed method is valid in cases of large deviations in the cell constants. The refinement according to this index closes the gap between crystal structure prediction and automated crystal structure determination. The opportunities of the new procedure have been demonstrated by crystal structure solution of un-indexed powder diagrams of some organic pigments (PY111, PR181 and Me-PR170).
The solubility is one of the crucial properties of drugs. From the thermodynamic point of view the prediction of the solubility is simple. The free energy of the drug molecule´s interactions must be the same in the crystal (Glattice ) and in the solution (Ghydration). In fact it is a tremendous theoretical problem. The free energy includes effects of temperature and pressure. However, quantum-mechanics gives a priori the enthalpy and the effect of temperature and pressure has to be added in laborious calculations. Here we would like to show a simple way for the prediction of the free energy for the crystal. Data Mining Force Fields (DMFF) can fulfill this task very quickly and accurate. The basic idea of data mining on crystal structures is that any crystal structure is a global minimum in the free energy. The force field is obtained by optimization of the parameters, till it assigns always to the experimental structure lower energy as to virtual crystal structures produced during crystal structure prediction. The free energy of the molecule in the crystal can be easily estimated by a crystal structure prediction with the data mining force field. The lowest energy of the predicted polymorphs corresponds to the free energy. This is even true, if the predicted crystal structure does not coincide with the experimental crystal structure, and, instead of the experimental crystal structure, another polymorph is predicted nearby in energy rank. We performed a crystal structure prediction for all 30 drugs, where we did find accurate values for the free energy of hydration. A plot of the free energy interaction ΔG=Glattice-Ghydration versus the solubility log(S) shows a very high correlation with a coefficient of determination of 0.92. The linear regression holds over a range of 14 magnitudes of solubility.
Co-crystals (or multicomponent crystals) have physico-chemical properties that are different from crystals of pure components. This is significant in drug development, since the desired properties, e.g. solubility, stability and bioavailability, can be tailored by binding two substances into a single crystal without chemical modification of an active component. Here, the FlexCryst program package, implemented with a data mining force field, was used to estimate the relative stability and, consequently, the relative solubility of co-crystals of flavonoids and agomelatine vs their pure components, stored in the Cambridge Structural Database. The considerable potency of this approach for in silico screening of co-crystals, as well as their relative solubility, was demonstrated.
Force Fields (FF) have a long history of development, and at a present a great variety of them exist, addressing a wide range properties of solids. One can distinguish two main approaches to the matter: FF, fitted to high accuracy DFT calculations, and FF, which systematically determine all parameters basing on big experimental data sets (Data Mining Force Field). Even if crystal structure prediction become reliable, a lot of challenging problems have to be solved: the impact of external conditions, calculations of salts and so on. Earlier the FF was reported which allows to predict structural temperature effects in crystals. Here we show modernized Data Mining approach, concentrating on energy and structural aspects. It includes the improvement of energy function, and allows for screening of big data sets for the possible formation polymorphs, co-crystals and salts, which plays an important role in pharma industry. Co-crystallization is a promising approach to generate novel crystal forms of known API with improved physico-chemical and pharmaco-kinetics properties. The actual accuracy of the energy function allows to estimate the solubility, which is one of the most important properties of drugs. The results of experimental vs virtual screening is presented for variety of chemical classes of molecular crystals and salts, including carboxylic and amino-acids, which very often cause problems in correct energy estimation and, as a consequence – failure in crystal structure prediction. Also for them the perfect correlation of experimental and predicted energy was found and more then 58% of structures have been predicted in ranks 1-20. The further achievements of Force Field development are awaited via introducing of bigger varieties of atom types (All Atom Force Field).