|
|
||
|
|
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]()
|
1. Core Chemistry
Any endeavor called “cyber-enabled
chemistry” should share at least two prominent characteristics. First,
cyber-enabled chemistry (CEC) relies on the presence of ubiquitous and
substantial network, information, and computing resources. Second, CEC is
problem-centered, and is directed in particular toward problems so complex
as to resist solution by any single method or approach. Giving some examples
of problems in chemistry that stand to benefit from better
cyberinfrastructure may help bring the term “cyber-enabled chemistry” into
tighter focus. A key feature of the following illustrative problems is that
none of them can be solved using solely methods from just one of those
subdisciplines, and that a complete solution must combine different areas of
experiment, theory, and simulation. ‧ Electrochemistry problems such as molecular structure of double layers, and active chemical species at an electrode surface ‧ Influence of complex environments on chemical reactions, understanding and predicting environmental contaminant flows ‧ Chemistry of electronically excited states and harnessing light energy ‧ Tribology, for example molecular origins of macroscopic behavior such as friction and lubrication ‧ Rules for self-assembly at the nanoscale, with emphasis on non-covalent interactions ‧ Combustion, which involves integrating electronic-structure, experiment, and reaction-kinetics models Cyber-enabled Challenges. Tools for CEC should allow researchers to focus on the problems themselves by freeing them from an enforced focus on simulation details. While these details should be open to examination on demand, their unobtrusiveness would allow researchers to focus on the chemical problem being simulated. Cyberinfrastructure could promote progress in the core chemistries by improving software interoperability, prototyping modeling to a prescribed accuracy with well-defined benchmark experiments and simulation data, archiving and data warehousing of experimental and simulation benchmarks, providing education in computational science and model chemistries, and enabling collaborations between experimentalists and theorists.Software Interoperability. The subdisciplines in theoretical chemistry generally have their own associated public and private software; in many cases, no single code suffices even to solve parts of the problem residing entirely within a single subdiscipline. Therefore, interoperability between codes both within and across these subdisciplines is required. There is a clear opportunity here for the NSF to play a strong role in encouraging the establishment of standards such as file formats to enable interoperability. However, such standards should be imposed only where the problems involved are sufficiently mature, and these standards should be extensible. A problem-oriented simulation language – a crucial missing ingredient in achieving interoperability across subdisciplines – could be realized through the definition of largescale tasks with standardized input and output that can be threaded together in a way that avoids detailed specification of what happens inside each task. Along with extensibility, such a language needs to provide rudimentary type-checking to ensure that all the information needed for any task will be available from prior tasks. Benchmarks for Model Accuracy. Simulation has become increasingly accepted as an essential tool in chemistry: for instance, many experimentalists use quantum-chemistry packages to help interpret results, and often with little or no outside consultation with theoretical and/or computational chemists. One advance for furthering this acceptance was the introduction of “model chemistries”: levels of theory that were empirically determined to provide a certain level of accuracy for a restricted set of questions. This “prototyping with prescribed accuracy” (PWPA) can be identified as a goal that needs to be achieved in the broader context of chemical simulation. Ideally, each such PWPA scheme would be a hierarchical approach with guaranteed accuracy (given sufficient computational resources) based on standardized protocols that the community can benchmark empirically in order to determine the level of accuracy expected. Databases and Data Warehousing. Cyberinfrastructure solutions that can enable PWPA include an extensive set of databases containing readily accessible experimental and theoretical results. Such data would allow testing of a specific proposed modular subsystem protocol (say, for use with a new database or new hardware) without carrying out all of the component simulations comprising the protocol. Furthermore, theoretical and experimental results must be stored with an associated “pedigree,” allowing database users to assess the data’s reliability with estimated error bars and sufficient information to allow researchers to revise these estimates if necessary. The NIST WebBook database of gasphase bond energies is a notable example of a current effort directed along these lines. Ideally, automated implementation of these protocols should actually proceed from specification not of the protocols themselves but rather of a desired accuracy. The simulation language would allow for automatic tests of the sensitivity of specified final results (for example, the space-time profile of the concentration of some species in a flame) to accuracy in component sub-problems (for example, computed reaction energies, diffusion constants, and fluid properties, to name but a few). There are simulation cases where a compelling reason exists to archive relatively vast amounts of simulation data. Such “community simulations” have value outside their original stated intent because, for example, they provide initial conditions that can be leveraged to go beyond the original questions asked, because they can be used in benchmarking and creating standardized protocols needed for PWPA, or because they provide useful model-consistent boundary conditions or averages for multiscale methods. In the first category are simulations of activated events such as the millisecond folding simulation of the Kollman group or the simulation of water freezing from the Ohmine group. These simulations were so challenging because of the long time scales involved, which is in turn directly related to the presence of free-energy barriers. Most methods developed to model rare events without brute-force simulation techniques rely on the availability of some representative samples where the event occurs. These “heroic” simulations can be put to use in a more statistically significant context. Simulations of water at different pressures and temperatures would fall in the second category – providing benchmarks for empirical water-potential models and aiding development of standardized protocols for simulations involving water as a solvent. In the last category are simulations of biological membranes, which could be used as boundary conditions for embedding membrane-bound proteins in studies of active site chemistry. Education in Computational Science and Engineering. The development of an improved cyberinfrastructure could go a long way towards achieving an equal partnership between simulation and experiment in solving chemical problems. But such an equal partnership implies a drastic change in the culture of chemistry. Thus, it is particularly important that such a change be cultivated at the level of the undergraduate curriculum. In some disciplines, such as physics, it is natural for students to turn to modeling as an aid in understanding the ramifications of a complex problem, but this is rarely the case in chemistry. Improvements in cyberinfrastructure will enable earlier and more aggressive introduction of simulation techniques into the classroom. However, cultivation of a “model it first” attitude among undergraduates and/or chemists as a whole is more useful if the simulation data come with a trust factor, i.e., error bars. Development of PWPA techniques is therefore critical. Moreover, the concept of error in simulation needs to be emphasized in order that simulation tools are not misused. Collaborations between Experiment and Theory. What would be the practical outcome of “simulation and experiment as equal partners”? Computing and modeling at the lab bench would become routine, both suggesting new experiments and, just as valuably, helping avoid experiments with little or no hope of success. Numerous cyberinfrastructure tools are explicitly designed to enhance or facilitate collaborations. Thus, there is a double effect whereby cyberinfrastructure promotes collaborations, and these collaborative efforts in turn increase the demand for improved cyberinfrastructure. This is welcome, since increased collaboration between experiment and theory is a must for progress on complex chemical problems and also for the validation of protocols needed for PWPA. Many of the important potential advances for chemistry in the 21 st century involve crossing an interface of one type or another. Significant intersections of chemistry and other disciplines (in parentheses) include: ‧ Understanding the chemistry of living systems in detail, including the development of medicines and therapies (biology, biochemistry, mathematical biology, bioinformatics, medicinal and pharmaceutical chemistry)‧ Understanding the complex chemistry of the earth (geology, environmental science, atmospheric science) ‧ Designing and producing new materials that can be predicted, tailored, and tuned before production, including investigating self-assembly as a useful approach to synthesis and manufacturing (physics, electrical engineering, materials science, biotechnology) Developing cyberinfrastructure to make these interfaces as seamless as possible will help address the challenges that arise. It is important to acknowledge multiple types of interfaces in the specification of needed infrastructure. One scientific theme underlying many of the areas described above is the requirement to cross multiple time and length scales. Examples range from representations or models for the breaking of bonds (quantum chemistry) to descriptions of molecular ensembles (force fields, molecular dynamics, Monte Carlo) to modeling of chemistry of complex environments (e.g., stochastic methods) to entire systems. Today, computational scientists are generally trained in depth in one sub-area, but are not expert in the models used for other time and length scales. Herein lies a challenge, since frequently data from a shorter time/length scale is used as input for the next model. Developing the interfaces between theory, computation, and experiment are also required to understand a new area of science. But again, because of specialization, no seamless interface exists between theorists, computational scientists, and experimentalists. Other interfaces deserve consideration, as well: Interfacing across institutions – academic, industrial labs, government labs, funding agencies – is needed to disseminate advances among the different institutions conducting research in the U.S., as well as across geographical locations, to take advantage of research already done around the globe. Better coordination between research and education is required to introduce new research topics into the undergraduate and K-12 curriculum, as well as for explaining significant new chemistry solutions that impact public policy such as stem-cell research or genetically modified foods. Cyber-enabled Challenges. Tools for cyber-enabled chemistry should allow for clear communication across the interfaces, broadly defined. The science interfaces listed above, for example, require scientific research involving complexity of representation, of analysis, of models, and of experimental design and measurement, even within individual sub-areas. How can all of the relevant information from one sub-area be conveyed (with advice about its reliability) to scientists in other sub-areas who use different terminology? How do we educate students at scientific boundaries, and also promote collaboration across disciplines? Chemistry Research and Education Communication. How do we present problems to the broader research and education community in the most engaging manner? Different disciplines may use different terminology and concepts to describe similar chemistry. A science search-engine – such as the recently introduced Google Scholar – would be highly desirable. Research, development, and use of ontologies, thesauri, knowledge representations, and standards for data representation are all ways to tackle these issues, and are computer science research efforts in their own right. In some cases, a problem is best expressed at a higher level of abstraction, e.g., in a way that a mathematical scientist might understand and use generic techniques to solve. Alternatively, a problem may need to be expressed in the language of a different sub-area in order to encourage experts in that area to generate data needed as input to a model at a different scale. It is therefore important to present theories and algorithms in a context that can be understood by those in related fields. Along another dimension, free access to the scientific literature carries particular importance for projects that span interfaces, because much of the literature required for these projects is not in core chemistry areas, but rather in a multitude of other disciplines. Today, in these cases, cost is a significant inhibitor to learning from the literature. As a result, advancement of the science is slower than it could be. Members of a newly formed multidisciplinary research team must initially learn about one another’s disciplines, and the results of their research must later be conveyed to the wider academic community. Web pages, links to related literature, and online courses may all be useful. In addition, the academic curriculum should be updated to teach the basic concepts of multiple traditional disciplines (rather than just chemistry) to the next generation of students so that they may more easily understand and contribute to new areas. Interfacing Data and Software across Disciplines. Potential inhibitors to clear communication across interfaces and to the success of multidisciplinary projects include, first, some sociological issues around data sharing. For example, traditional scientists, who have been encouraged to become deep experts in a particular field, may not feel the need, or have the time, to make their data available to others. To combat this and to encourage widespread deposition of protein structural data, the Protein Data Bank (PDB) has successfully allied itself with influential journals that frown on submission of articles without deposition of data in the databank. However, it is not clear that this model can be extended to all areas of chemistry. This raises another issue, namely support for centralized data sources that can be centrally curated, as is the PDB, versus distributed autonomous sources where the maintenance and support is managed more cost-effectively by multiple groups, but where the data models and curation protocols are likely to differ, thus hampering integration of the data. Finding and understanding relevant data requires reliability and accuracy. Users of data need to understand its accuracy and the assumptions used in its derivation in order to use it wisely. Algorithm developers need to know what degree of accuracy is required (i.e., when their algorithm is good enough) for different uses of the data. Cyberinfrastructure “validation services” could provide information on what to compare, how to compare, and what protocols to follow in the comparison. Curation is essential for ensuring improved reliability, although, in general, expert curation (unlike automated curation) is not scalable. Data provenance and annotation are also important. Standards are essential for interoperability among applied programming interfaces (APIs) as well as among data models, although the difficulties of standards adoption should not be underestimated. Adoption of standards partially addresses this concern, but there is also a significant need for robust software-engineering practices, so that software that is developed for one subdiscipline can be easily transferred to codes for other subdisciplines. This facilitates building on proven technology where appropriate, and for funding of software maintenance and support. Development of Collaboration Tools. Cyberinfrastructure can help existing, geographically dispersed teams communicate more effectively. Examples of useful collaboration tools are those that would improve point-to-point communication with usable remote-whiteboard technology, or would better enable viable international videoconferencing, such as VRVS (Virtual Rooms Videoconferencing System). In order to make many multidisciplinary projects successful, budgets have to cover technical people much depth as an expert in any one area. New information technology with sophisticated knowledge representation may someday fill this gap, but in the foreseeable future such people will be vital to the success of a multidisciplinary project. Large projects also need expert project managers to facilitate collaborations and supervise design, development, testing, and deployment of robust software. Funding should be available for multidisciplinary scientific projects (as long as such funding does not negatively impinge upon individual PI funding) that focus on novel science and novel computer science as well as for projects that focus on novel science enabled by design and deployment of infrastructure based on current technology. (In fact, any one project may involve both of these aspects).3. Computational and Experimental Chemistry Interactions Increasingly, experimentalists and computational chemists are teaming up to tackle the challenging problems presented by complex chemical and biological systems. However, many of the most challenging and critical problems are pushing the limits of the capabilities of current simulation methods. A key aspect of the computational/experimental interface is validation. This is a two-way process: High-quality experimental data are necessary for validating computational models, and results from highly accurate computational methods can frequently play an important role in validating experimental data, or provide qualitative insight that permits the development of new experimental directions. Improved Computational Models that Connect with Experiment. A defining area of intersection between computation and experiment – the use of efficient computational models to drive experiments (for example, to predict optimal experimental conditions for deriving the highest-quality experimental data, or to design "smart experiments") – could lower cost of discovery and process design. Several key prospective infrastructural advances will help researchers bridge the computational/experimental interface. Highbandwidth networks will allow large amounts of data to rapidly move among researchers. Robust visualization and analysis tools will give researchers better chemical insight from data exchanges. Better data access and database querying tools are also needed. Continued development of new and improved methods for modeling systems with non-bonded interactions, hard materials (e.g., ceramics), and interfacial processes should be given high priority. For many of these problems, there is a need to develop computational methods that truly bridge different time and length scales. To validate such approaches, it is important to generate and maintain databases with data from experiments and simulations, which in turn means developing mechanisms for certifying data, establishing standardized formats for data from different sources, and developing new tools (expert systems and visualization software) for querying a data base and analyzing data. Promoting Experimental/Theoretical Collaboration. Real-time interactions between experiments and simulations are needed in order to maximize the benefit for groups to interact effectively. Although groups are beginning to explore opportunities to enhance these collaborations, real-time interactions at present limit their full exploitation. One problem is the time required to carry out experiments or simulations. Faster algorithms and peak-performance models/methods should help facilitate crosscomputational/experimental interactions. Better software and analyses of experimental data/databases, perhaps using expert systems, should help computational modelers access experimental data/results. The computational/experimental interface also has an educational dimension. First, the Internet era has made it easier than ever for experimental and theoretical groups in different locations to interact. Modern cybertechnology also makes it possible for these interactions to involve students and not just faculty members. 4. Grand Challenges in Chemistry Some very broadly defined areas of
chemistry may yield only to next-generation technologies and innovations,
which will in turn rely heavily on the development and application of novel
cyberinfrastructures enhancing both computational power and collaborative
efforts. In particular, three key grand challenges were discussed: ‧ Development of modeling protocols that can represent very large sections of potential-energy surfaces of very high dimensionality to chemical accuracy, typically defined as within 1 kcal mol–1 of experiment. This level of accuracy will be critical to the successful modeling of such multiscale problems as protein folding, aggregation, self-assembly, phase separation, and phase changes such as those involved in conversion between crystal polymorphs. With respect to the latter problem, simply predicting the most stable crystal structure for an arbitrary molecule remains an outstanding grand challenge. ‧ Development of algorithms and data-handling protocols capable of providing realtime feedback to control a reacting system actively monitored by sensor technology (for example, controlling combustion of a reactive gas in a flow chamber). Solving such multiscale problems requires transferring data among adjacent scales, so that smaller-scale results can be the foundation for larger-scale model parameters and, at the same time, the larger-scale results can feed back to the smaller-scale model for refinement (e.g., improved accuracy). Addressing this grand challenge means developing algorithms for propagating deterministic or probabilistic system evolution of fine and coarse scales. In addition, most systems of interest are expected to be multiphase in nature, e.g., solids in contact with gases , or high polymers in solution, or a substance that is poorly characterized with respect to phase, as is a glass. Characterization of any of these systems will require considering significant ranges of system variables such as temperature and pressure. An added level of difficulty may arise when the system is not limited to its ground electronic state.Another key point is that it is not really the potential energy surface, but the free energy surface, that needs to be modeled accurately. This requires an accurate modeling of entropy. It is unlikely that ideal-gas molecular partition functions will be sufficiently robust for this task. Improved algorithms for estimating entropy and other thermodynamic parameters will be critical to better modeling in this area. For problem areas such as combustion and sensor control, attaining the speeds needed for controlling combustion of a reactive gas in a flow chamber may require development of specialized hardware optimized to the algorithms involved. In addition, methods for handling very large data flows arriving from the sensors (and possibly being passed to control mechanisms) will need to be developed. All three grand challenges share several common features that will place an onus on cyberinfrastructure development. First, model quality cannot be evaluated in the absence of experimental data against which to conduct validation studies. Useful data are not always available, and support for further measurement should not be ignored. Centralization of validation data into convenient databases – ideally with quality review of individual entries and standardization of formats – would contribute to more-efficient development efforts. Second, model/algorithm development at all but the very smallest scale inevitably involves some parameterization. Support for cyberinfrastructure tools that might speed parameter optimization (e.g., via grid computing across multiple sites) and simplify analysis of parameter sensitivity should also be a priority. Third, approaches to grand-challenge problems will benefit from improvements in processor speeds, memory usage, parallel-algorithm development, and grid-management technology. Finally, to ensure the maximum utility of tools developed as cyberinfrastructure, developers need to be multidisciplinary, either individually or as teams, so that the tools themselves will be characterized both by good chemistry and physics and by good software engineering.
This site was last updated 01/12/05 |
|