Deep neural operators can serve as accurate surrogates for shape optimization: A case study for airfoils|
Subtitled arXiv:2302.00807v1, K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis. 2023.
Deep neural operators, such as DeepONets, have changed the paradigm in high-dimensional nonlinear regression from function regression to (differential) operator regression, paving the way for significant changes in computational engineering applications. Here, we investigate the use of DeepONets to infer flow fields around unseen airfoils with the aim of shape optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results which display little to no degradation in prediction accuracy, while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as their shape can be easily defined by the four-digit parametrization. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work.
A Metalearning Approach for Physics-Informed Neural Networks (PINNs): Application to Parameterized PDEs|
M. Penwarden, S. Zhe, A. Narayan, R.M. Kirby. In Journal of Computational Physics, Elsevier, 2023.
Physics-informed neural networks (PINNs) as a means of discretizing partial differential equations (PDEs) are garnering much attention in the Computational Science and Engineering (CS&E) world. At least two challenges exist for PINNs at present: an understanding of accuracy and convergence characteristics with respect to tunable parameters and identification of optimization strategies that make PINNs as efficient as other computational science tools. The cost of PINNs training remains a major challenge of Physics-informed Machine Learning (PiML) – and, in fact, machine learning (ML) in general. This paper is meant to move towards addressing the latter through the study of PINNs on new tasks, for which parameterized PDEs provides a good testbed application as tasks can be easily defined in this context. Following the ML world, we introduce metalearning of PINNs with application to parameterized PDEs. By introducing metalearning and transfer learning concepts, we can greatly accelerate the PINNs optimization process. We present a survey of model-agnostic metalearning, and then discuss our model-aware metalearning applied to PINNs as well as implementation considerations and algorithmic complexity. We then test our approach on various canonical forward parameterized PDEs that have been presented in the emerging PINNs literature.
Accelerating Physics Schemes in Numerical Weather Prediction Codes and Preserving Positivity in the Physics-Dynamics coupling|
Timbwaoga Aime Judicael (TAJO) Ouermi. University of Utah, 2022.
The Materials Commons Data Repository|
G. Tarcea, B. Puchala, T. Berman, G. Scorzelli, V. Pascucci, M, Taufer, J. Allison. In 2022 IEEE 18th International Conference on e-Science (e-Science), pp. 405--406. 2022.
Repositories are increasingly used for publishing and sharing scientific data. The Materials Commons is a data repository that follows the FAIR (Findable, Accessible, Inter-operable, Reusable) principles. We demonstrate the challenges with FAIR and how Materials Commons solves them. We also discuss the Nationals Science Data Fabric (NSDF) , a project that is democratizing data access, and show how Materials Commons with the NSDF software stack accelerates data access and scientific research.
NSDF-Catalog: Lightweight Indexing Service for Democratizing Data Delivering|
J. Luettgau, C.R. Kirkpatrick, G. Scorzelli, V. Pascucci, G. Tarcea, M. Taufer. 2022.
Across domains massive amounts of scientific data are generated. Because of the large volume of information, data discoverability is often hard if not impossible, especially for scientists who have not generated the data or are from other domains. As part of the NSF-funded National Science Data Fabric (NSDF) initiative, we develop a testbed to demonstrate that these boundaries to data discoverability can be overcome. In support of this effort, we identify the need for indexing large-amounts of scientific data across scientific domains. We propose NSDF-Catalog, a lightweight indexing service with minimal metadata that complements existing domain-specific and rich-metadata collections. NSDF-Catalog is designed to facilitate multiple related objectives within a flexible microservice to: (i) coordinate data movements and replication of data from origin repositories within the NSDF federation; (ii) build an inventory of existing scientific data to inform the design of next-generation cyberinfrastructure; and (iii) provide a suite of tools for discovery of datasets for cross-disciplinary research. Our service indexes scientific data at a fine-granularity at the file or object level to inform data distribution strategies and to improve the experience for users from the consumer perspective, with the goal of allowing end-to-end dataflow optimizations
Meta Learning of Interface Conditions for Multi-Domain Physics-Informed Neural Networks|
Subtitled arXiv preprint arXiv:2210.12669, S. Li, M. Penwarden, R.M. Kirby, S. Zhe. 2022.
Physics-informed neural networks (PINNs) are emerging as popular mesh-free solvers for partial differential equations (PDEs). Recent extensions decompose the domain, applying different PINNs to solve the equation in each subdomain and aligning the solution at the interface of the subdomains. Hence, they can further alleviate the problem complexity, reduce the computational cost, and allow parallelization. However, the performance of the multi-domain PINNs is sensitive to the choice of the interface conditions for solution alignment. While quite a few conditions have been proposed, there is no suggestion about how to select the conditions according to specific problems. To address this gap, we propose META Learning of Interface Conditions (METALIC), a simple, efficient yet powerful approach to dynamically determine the optimal interface conditions for solving a family of parametric PDEs. Specifically, we develop two contextual multi-arm bandit models. The first one applies to the entire training procedure, and online updates a Gaussian process (GP) reward surrogate that given the PDE parameters and interface conditions predicts the solution error. The second one partitions the training into two stages, one is the stochastic phase and the other deterministic phase; we update a GP surrogate for each phase to enable different condition selections at the two stages so as to further bolster the flexibility and performance. We have shown the advantage of METALIC on four bench-mark PDE families.
Batch Multi-Fidelity Active Learning with Budget Constraints|
Subtitled arXiv:2210.12704v1, S. Li, J.M. Phillips, X. Yu, R.M. Kirby, S. Zhe. 2022.
Learning functions with high-dimensional outputs is critical in many applications, such as physical simulation and engineering design. However, collecting training examples for these applications is often costly, e.g. by running numerical solvers. The recent work (Li et al., 2022) proposes the first multi-fidelity active learning approach for high-dimensional outputs, which can acquire examples at different fidelities to reduce the cost while improving the learning performance. However, this method only queries at one pair of fidelity and input at a time, and hence has a risk to bring in strongly correlated examples to reduce the learning efficiency. In this paper, we propose Batch Multi-Fidelity Active Learning with Budget Constraints (BMFAL-BC), which can promote the diversity of training examples to improve the benefit-cost ratio, while respecting a given budget constraint for batch queries. Hence, our method can be more practically useful. Specifically, we propose a novel batch acquisition function that measures the mutual information between a batch of multi-fidelity queries and the target function, so as to penalize highly correlated queries and encourages diversity. The optimization of the batch acquisition function is challenging in that it involves a combinatorial search over many fidelities while subject to the budget constraint. To address this challenge, we develop a weighted greedy algorithm that can sequentially identify each (fidelity, input) pair, while achieving a near -approximation of the optimum. We show the advantage of our method in several computational physics and engineering applications.
Finite-Time Analysis of Adaptive Temporal Difference Learning with Deep Neural Networks|
T. Sun, D. Li, B. Wang. In 36th Conference on Neural Information Processing Systems (NeurIPS 2022), October, 2022.
Temporal difference (TD) learning with function approximations (linear functions or neural networks) has achieved remarkable empirical success, giving impetus to the development of finite-time analysis. As an accelerated version of TD, the adaptive TD has been proposed and proved to enjoy finite-time convergence under the linear function approximation. Existing numerical results have demonstrated the superiority of adaptive algorithms to vanilla ones. Nevertheless, the performance guarantee of adaptive TD with neural network approximation remains widely unknown. This paper establishes the finite-time analysis for the adaptive TD with multi-layer ReLU networks approximation whose samples are generated from a Markov decision process. Our established theory shows that if the width of the deep neural network is large enough, the adaptive TD using neural network approximation can find the (optimal) value function with high probabilities under the same iteration complexity as TD in general cases. Furthermore, we show that the adaptive TD using neural network approximation, with the same width and searching area, can achieve theoretical acceleration when the stochastic semigradients decay fast.
Quadrature Sampling of Parametric Models with Bi-fidelity Boosting|
Subtitled arXiv:2209.05705v1, N. Cheng, O.A. Malik, Y. Xu, S. Becker, A. Doostan, A. Narayan. 2022.
Least squares regression is a ubiquitous tool for building emulators (a.k.a. surrogate models) of problems across science and engineering for purposes such as design space exploration and uncertainty quantification. When the regression data are generated using an experimental design process (e.g., a quadrature grid) involving computationally expensive models, or when the data size is large, sketching techniques have shown promise to reduce the cost of the construction of the regression model while ensuring accuracy comparable to that of the full data. However, random sketching strategies, such as those based on leverage scores, lead to regression errors that are random and may exhibit large variability. To mitigate this issue, we present a novel boosting approach that leverages cheaper, lower-fidelity data of the problem at hand to identify the best sketch among a set of candidate sketches. This in turn specifies the sketch of the intended high-fidelity model and the associated data. We provide theoretical analyses of this bi-fidelity boosting (BFB) approach and discuss the conditions the low- and high-fidelity data must satisfy for a successful boosting. In doing so, we derive a bound on the residual norm of the BFB sketched solution relating it to its ideal, but computationally expensive, high-fidelity boosted counterpart. Empirical results on both manufactured and PDE data corroborate the theoretical analyses and illustrate the efficacy of the BFB solution in reducing the regression error, as compared to the non-boosted solution.
Fast Algorithms for Monotone Lower Subsets of Kronecker Least Squares Problems|
Subtitled arXiv:2209.05662v1, O.A. Malik, Y. Xu, N. Cheng, S. Becker, A. Doostan, A. Narayan. 2022.
Approximate solutions to large least squares problems can be computed efficiently using leverage score-based row-sketches, but directly computing the leverage scores, or sampling according to them with naive methods, still requires an expensive manipulation and processing of the design matrix. In this paper we develop efficient leverage score-based sampling methods for matrices with certain Kronecker product-type structure; in particular we consider matrices that are monotone lower column subsets of Kronecker product matrices. Our discussion is general, encompassing least squares problems on infinite domains, in which case matrices formally have infinitely many rows. We briefly survey leverage score-based sampling guarantees from the numerical linear algebra and approximation theory communities, and follow this with efficient algorithms for sampling when the design matrix has Kronecker-type structure. Our numerical examples confirm that sketches based on exact leverage score sampling for our class of structured matrices achieve superior residual compared to approximate leverage score sampling methods.
Uncertainty quantification for ecological models with random parameters|
J.R. Reimer, F.R. Adler, K.M. Golden, A. Narayan. In Ecology Letters, Wiley, pp. 1--13. 2022.
There is often considerable uncertainty in parameters in ecological models. This uncertainty can be incorporated into models by treating parameters as random variables with distributions, rather than fixed quantities. Recent advances in uncertainty quantification methods, such as polynomial chaos approaches, allow for the analysis of models with random parameters. We introduce these methods with a motivating case study of sea ice algal blooms in heterogeneous environments. We compare Monte Carlo methods with polynomial chaos techniques to help understand the dynamics of an algal bloom model with random parameters. Modelling key parameters in the algal bloom model as random variables changes the timing, intensity and overall productivity of the modelled bloom. The computational efficiency of polynomial chaos methods provides a promising avenue for the broader inclusion of parametric uncertainty in ecological models, leading to improved model predictions and synthesis between models and data.
Democratizing Science Through Advanced Cyberinfrastructure|
M. Parashar. In Computer, IEEE, 2022.
Democratizing access to cyberinfrastructure is essential to democratizing science. This article explores knowledge, technical, and social barriers to accessing and using cyberinfrastructure and explores approaches to addresses them. It also highlights recent activities and investments at the National Science Foundation that implement some of these approaches.
Adaptive and Implicit Regularization for Matrix Completion|
Subtitled arXiv preprint arXiv:2208.05640, Z. Li, T. Sun, H. Wang, B. Wang. 2022.
The explicit low-rank regularization, e.g., nuclear norm regularization, has been widely used in imaging sciences. However, it has been found that implicit regularization outperforms explicit ones in various image processing tasks. Another issue is that the fixed explicit regularization limits the applicability to broad images since different images favor different features captured by different explicit regularizations. As such, this paper proposes a new adaptive and implicit low-rank regularization that captures the low-rank prior dynamically from the training data. The core of our new adaptive and implicit low-rank regularization is parameterizing the Laplacian matrix in the Dirichlet energy-based regularization, which we call the regularization AIR. Theoretically, we show that the adaptive regularization of AIR enhances the implicit regularization and vanishes at the end of training. We validate AIR’s effectiveness on various benchmark tasks, indicating that the AIR is particularly favorable for the scenarios when the missing entries are non-uniform. The code can be found at https://github.com/lizhemin15/AIR-Net.
M. Parashar, M.A. Heroux, V. Stodde. In Computer, Vol. 55, No. 8, IEEE, pp. 16--18. August, 2022.
Reproducibility has a foundational role in ensuring robust and trustworthy research, but achieving reproducibility can be challenging. This theme issue explores these challenges along with research and implementations across communities addressing them, with the goal of understanding the impact of existing solutions and synthesizing lessons learned and emerging best practices.
Momentum Transformer: Closing the Performance Gap Between Self-attention and Its Linearization|
Subtitled arXiv preprint arXiv:2208.00579, T. Nguyen, R.G. Baraniuk, R.M. Kirby, S.J. Osher, B. Wang. 2022.
Transformers have achieved remarkable success in sequence modeling and beyond but suffer from quadratic computational and memory complexities with respect to the length of the input sequence. Leveraging techniques include sparse and linear attention and hashing tricks; efficient transformers have been proposed to reduce the quadratic complexity of transformers but significantly degrade the accuracy. In response, we first interpret the linear attention and residual connections in computing the attention map as gradient descent steps. We then introduce momentum into these components and propose the \emphmomentum transformer, which utilizes momentum to improve the accuracy of linear transformers while maintaining linear memory and computational complexities. Furthermore, we develop an adaptive strategy to compute the momentum value for our model based on the optimal momentum for quadratic optimization. This adaptive momentum eliminates the need to search for the optimal momentum value and further enhances the performance of the momentum transformer. A range of experiments on both autoregressive and non-autoregressive tasks, including image generation and machine translation, demonstrate that the momentum transformer outperforms popular linear transformers in training efficiency and accuracy.
Assembling Portable In-Situ Workflow from Heterogeneous Components using Data Reorganization|
B. Zhang, P. Subedi, P. E. Davis, F. Rizzi, K. Teranishi, M. Parashar. In 22nd IEEE International Symposium on Cluster, Cloud and Internet Computing (CCGrid), pp. 41-50. 2022.
Heterogeneous computing is becoming common in the HPC world. The fast-changing hardware landscape is pushing programmers and developers to rely on performance-portable programming models to rewrite old and legacy applications and develop new ones. While this approach is suitable for individual applications, outstanding challenges still remain when multiple applications are combined into complex workflows. One critical difficulty is the exchange of data between communicating applications where performance constraints imposed by heterogeneous hardware advantage different data layouts. We attempt to solve this problem by exploring asynchronous data layout conversions for applications requiring different memory access patterns for shared data. We implement the proposed solution within the DataSpaces data staging service, extending it to support heterogeneous application workflows across a broad spectrum of programming models. In addition, we integrate heterogeneous DataSpaces with the Kokkos programming model and propose the Kokkos Staging Space as an extension of the Kokkos data abstraction. This new abstraction enables us to express data on a virtual shared space for multiple Kokkos applications, thus guaranteeing the portability of each application when assembling them into an efficient heterogeneous workflow. We present performance results for the Kokkos Staging Space using a synthetic workflow emulator and three different scenarios representing access frequency and use patterns in shared data. The results show that the Kokkos Staging Space is a superior solution in terms of time-to-solution and scalability compared to existing file-based Kokkos data abstractions for inter-application data exchange.
Transforming science through cyberinfrastructure|
M. Parashar, A. Friedlander, E. Gianchandani,, M. Martonosi. In Communications of the ACM, Vol. 65, No. 8, pp. 30–32. 2022.
NSF's vision for the U.S. cyberinfrastructure ecosystem for science and engineering in the 21st century.
Adaptive Self-supervision Algorithms for Physics-informed Neural Networks |
Subtitled arXiv:2207.04084, S. Subramanian, R.M. Kirby, M.W. Mahoney, A. Gholami. 2022.
Physics-informed neural networks (PINNs) incorporate physical knowledge from the problem domain as a soft constraint on the loss function, but recent work has shown that this can lead to optimization difficulties. Here, we study the impact of the location of the collocation points on the trainability of these models. We find that the vanilla PINN performance can be significantly boosted by adapting the location of the collocation points as training proceeds. Specifically, we propose a novel adaptive collocation scheme which progressively allocates more collocation points (without increasing their number) to areas where the model is making higher errors (based on the gradient of the loss function in the domain). This, coupled with a judicious restarting of the training during any optimization stalls (by simply resampling the collocation points in order to adjust the loss landscape) leads to better estimates for the prediction error. We present results for several problems, including a 2D Poisson and diffusion-advection system with different forcing functions. We find that training vanilla PINNs for these problems can result in up to 70% prediction error in the solution, especially in the regime of low collocation points. In contrast, our adaptive schemes can achieve up to an order of magnitude smaller error, with similar computational complexity as the baseline. Furthermore, we find that the adaptive methods consistently perform on-par or slightly better than vanilla PINN method, even for large collocation point regimes. The code for all the experiments has been open sourced.
A scalable adaptive-matrix SPMV for heterogeneous architectures|
H. D. Tran, M. Fernando, K. Saurabh, B. Ganapathysubramanian, R. M. Kirby, H. Sundar. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 13--24. 2022.
In most computational codes, the core computational kernel is the Sparse Matrix-Vector product (SpMV) that enables specialized linear algebra libraries like PETSc to be used, especially in the distributed memory setting. However, optimizing SpMvperformance and scalability at all levels of a modern heterogeneous architecture can be challenging as it is characterized by irregular memory access. This work presents a hybrid approach (HyMV) for evaluating SpMV for matrices arising from PDE discretization schemes such as the finite element method (FEM). The approach enables localized structured memory access that provides improved performance and scalability. Additionally, it simplifies the programmability and portability on different architectures. The developed HyMV approach enables efficient parallelization using MPI, SIMD, OpenMP, and CUDA with minimum programming effort. We present a detailed comparison of HyMV with the two traditional approaches in computational code, matrix-assembled and matrix-free approaches, for structured and unstructured meshes. Our results demonstrate that the HyMV approach achieves excellent scalability and outperforms both approaches, e.g., achieving average speedups of 11x for matrix setup, 1.7x for SpMV with structured meshes, 3.6x for SpMV with unstructured meshes, and 7.5x for GPU SpMV.
Colza: Enabling Elastic In Situ Visualization for High-performance Computing Simulations|
M. Dorier, Z. Wang, U. Ayachit, S. Snyder, R. Ross, M. Parashar. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), IEEE, pp. 538-548. 2022.
In situ analysis and visualization have grown increasingly popular for enabling direct access to data from high-performance computing (HPC) simulations. As a simulation progresses and interesting physical phenomena emerge, however, the data produced may become increasingly complex, and users may need to dynamically change the type and scale of in situ analysis tasks being carried out and consequently adapt the amount of resources allocated to such tasks. To date, none of the production in situ analysis frameworks offer such an elasticity feature, and for good reason: the assumption that the number of processes could vary during run time would force developers to rethink software and algorithms at every level of the in situ analysis stack. In this paper we present Colza, a data staging service with elastic in situ visualization capabilities. Colza relies on the widely used ParaView Catalyst in situ visualization framework and enables elasticity by replacing MPI with a custom collective communication library based on the Mochi suite of libraries. To the best of our knowledge, this work is the first to enable elastic in situ visualization capabilities for HPC applications on top of existing production analysis tools.