Designed especially for neurobiologists, FluoRender is an interactive tool for multi-channel fluorescence microscopy data visualization and analysis.
Deep brain stimulation
BrainStimulator is a set of networks that are used in SCIRun to perform simulations of brain stimulation such as transcranial direct current stimulation (tDCS) and magnetic transcranial stimulation (TMS).
Developing software tools for science has always been a central vision of the SCI Institute.

Scientific Computing

Numerical simulation of real-world phenomena provides fertile ground for building interdisciplinary relationships. The SCI Institute has a long tradition of building these relationships in a win-win fashion – a win for the theoretical and algorithmic development of numerical modeling and simulation techniques and a win for the discipline-specific science of interest. High-order and adaptive methods, uncertainty quantification, complexity analysis, and parallelization are just some of the topics being investigated by SCI faculty. These areas of computing are being applied to a wide variety of engineering applications ranging from fluid mechanics and solid mechanics to bioelectricity.


martin

Martin Berzins

Parallel Computing
GPUs
mike

Mike Kirby

Finite Element Methods
Uncertainty Quantification
GPUs
pascucci

Valerio Pascucci

Scientific Data Management
chris

Chris Johnson

Problem Solving Environments
amir

Amir Arzani

Scientific machine learning
Data-driven fluid flow modeling

Funded Research Projects:


Publications in Scientific Computing:


COMPUTATIONAL ERROR ESTIMATION FOR THE MATERIAL POINT METHOD IN 1D AND 2D
M. Berzins. In VIII International Conference on Particle-Based Methods, PARTICLES 2023, 2024.

The Material Point Method (MPM) is widely used for challenging applications in engineering, and animation but lags behind some other methods in terms of error analysis and computable error estimates. The complexity and nonlinearity of the equations solved by the method and its reliance both on a mesh and on moving particles makes error estimation challenging. Some preliminary error analysis of a simple MPM method has shown the global error to be first order in space and time for a widely-used variant of the Material Point Method. The overall time dependent nature of MPM also complicates matters as both space and time errors and their evolution must be considered thus leading to the use of explicit error transport equations. The preliminary use of an error estimator based on this transport approach has yielded promising results in the 1D case. One other source of error in MPM is the grid-crossing error that can be problematic for large deformations leading to large errors that are identified by the error estimator used. The extension of the error estimation approach to two space higher dimensions is considered and together with additional algorithmic and theoretical results, shown to give promising results in preliminary computational experiments.



Making Uintah Performance Portable for Department of Energy Exascale Testbeds
J. K. Holmen , M. Garcıa, A. Bagusetty, V. Madananth, A. Sanderson,, M. Berzins. In Euro-Par 2023: Parallel Processing, pp. 1--12. 2024.

To help ease ports to forthcoming Department of Energy (DOE) exascale systems, testbeds have been made available to select users. These testbeds are helpful for preparing codes to run on the same hardware and similar software as in their respective exascale systems. This paper describes how the Uintah Computational Framework, an open-source asynchronous many-task (AMT) runtime system, has been modified to be performance portable across the DOE Crusher, DOE Polaris, and DOE Sunspot testbeds in preparation for portable simulations across the exascale DOE Frontier and DOE Aurora systems. The Crusher, Polaris, and Sunspot testbeds feature the AMD MI250X, NVIDIA A100, and Intel PVC GPUs, respectively. This performance portability has been made possible by extending Uintah’s intermediate portability layer [18] to additionally support the Kokkos::HIP, Kokkos::OpenMPTarget, and Kokkos::SYCL back-ends. This paper also describes notable updates to Uintah’s support for Kokkos, which were required to make this extension possible. Results are shown for a challenging radiative heat transfer calculation, central to the University of Utah’s predictive boiler simulations. These results demonstrate single-source portability across AMD-, NVIDIA-, and Intel-based GPUs using various Kokkos back-ends.



Solving High Frequency and Multi-Scale PDEs with Gaussian Processes
Subtitled “arXiv:2311.04465,” S. Fang, M. Cooley, D. Long, S. Li, R. Kirby, S. Zhe. 2023.

Machine learning based solvers have garnered much attention in physical simulation and scientific computing, with a prominent example, physics-informed neural networks (PINNs). However, PINNs often struggle to solve high-frequency and multi-scale PDEs, which can be due to the spectral bias during neural network training. To address this problem, we resort to the Gaussian process (GP) framework. To flexibly capture the dominant frequencies, we model the power spectrum of the PDE solution with a student t mixture or Gaussian mixture. We then apply inverse Fourier transform to obtain the covariance function (according to the Wiener-Khinchin theorem). The covariance derived from the Gaussian mixture spectrum corresponds to the known spectral mixture kernel. We are the first to discover its rationale and effectiveness for PDE solving. Next, we estimate the mixture weights in the log domain, which we show is equivalent to placing a Jeffreys prior. It automatically induces sparsity, prunes excessive frequencies, and adjusts the remaining toward the ground truth. Third, to enable efficient and scalable computation on massive collocation points, which are critical to capture high frequencies, we place the collocation points on a grid, and multiply our covariance function at each input dimension. We use the GP conditional mean to predict the solution and its derivatives so as to fit the boundary condition and the equation itself. As a result, we can derive a Kronecker product structure in the covariance matrix. We use Kronecker product properties and multilinear algebra to greatly promote computational efficiency and scalability, without any low-rank approximations. We show the advantage of our method in systematic experiments.



An NSF REU Site Based on Trust and Reproducibility of Intelligent Computation: Experience Report
M. Hall, G. Gopalakrishnan, E. Eide, J. Cohoon, J. Phillips, M. Zhang, S. Elhabian, A. Bhaskara, H. Dam, A. Yadrov, T. Kataria. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, 2023.

This paper presents an overview of an NSF Research Experience for Undergraduate (REU) Site on Trust and Reproducibility of Intelligent Computation, delivered by faculty and graduate students in the Kahlert School of Computing at University of Utah. The chosen themes bring together several concerns for the future in producing computational results that can be trusted: secure, reproducible, based on sound algorithmic foundations, and developed in the context of ethical considerations. The research areas represented by student projects include machine learning, high-performance computing, algorithms and applications, computer security, data science, and human-centered computing. In the first four weeks of the program, the entire student cohort spent their mornings in lessons from experts in these crosscutting topics, and used one-of-a-kind research platforms operated by the University of Utah, namely NSF-funded CloudLab and POWDER facilities; reading assignments, quizzes, and hands-on exercises reinforced the lessons. In the subsequent five weeks, lectures were less frequent, as students branched into small groups to develop their research projects. The final week focused on a poster presentation and final report. Through describing our experiences, this program can serve as a model for preparing a future workforce to integrate machine learning into trustworthy and reproducible applications.



CLASSMix: Adaptive stain separation-based contrastive learning with pseudo labeling for histopathological image classification
Subtitled “arXiv:2312.06978v2,” B. Zhang, H. Manoochehri, M.M. Ho, F. Fooladgar, Y. Chong, B. Knudsen, D. Sirohi, T. Tasdizen. 2023.

Histopathological image classification is one of the critical aspects in medical image analysis. Due to the high expense associated with the labeled data in model training, semi-supervised learning methods have been proposed to alleviate the need of extensively labeled datasets. In this work, we propose a model for semi-supervised classification tasks on digital histopathological Hematoxylin and Eosin (H&E) images. We call the new model Contrastive Learning with Adaptive Stain Separation and MixUp (CLASS-M). Our model is formed by two main parts: contrastive learning between adaptively stain separated Hematoxylin images and Eosin images, and pseudo-labeling using MixUp. We compare our model with other state-of-the-art models on clear cell renal cell carcinoma (ccRCC) datasets from our institution and The Cancer Genome Atlas Program (TCGA). We demonstrate that our CLASS-M model has the best performance on both datasets. The contributions of different parts in our model are also analyzed.



Accelerating Data-Intensive Seismic Research Through Parallel Workflow Optimization and Federated Cyberinfrastructure
M. Adair, I. Rodero, M. Parashar, D. Melgar. In Proceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis, ACM, pp. 1970--1977. 2023.
DOI: 10.1145/3624062.3624276

Earthquake early warning systems use synthetic data from simulation frameworks like MudPy to train models for predicting the magnitudes of large earthquakes. MudPy, although powerful, has limitations: a lengthy simulation time to generate the required data, lack of user-friendliness, and no platform for discovering and sharing its data. We introduce FakeQuakes DAGMan Workflow (FDW), which utilizes Open Science Grid (OSG) for parallel computations to accelerate and streamline MudPy simulations. FDW significantly reduces runtime and increases throughput compared to a single-machine setup. Using FDW, we also explore partitioned parallel HTCondor DAGMan workflows to enhance OSG efficiency. Additionally, we investigate leveraging cyberinfrastructure, such as Virtual Data Collaboratory (VDC), for enhancing MudPy and OSG. Specifically, we simulate using Cloud bursting policies to enforce FDW job-offloading to VDC during OSG peak demand, addressing shared resource issues and user goals; we also discuss VDC’s value in facilitating a platform for broad access to MudPy products.



Deep neural operators as accurate surrogates for shape optimization
K. Shukla, V. Oommen, A. Peyvan, M. Penwarden, N. Plewacki, L. Bravo, A. Ghoshal, R.M. Kirby, G. Karniadakis. In Engineering Applications of Artificial Intelligence, Vol. 129, pp. 107615. 2023.
ISSN: 0952-1976

Deep neural operators, such as DeepONet, have changed the paradigm in high-dimensional nonlinear regression, paving the way for significant generalization and speed-up in computational engineering applications. Here, we investigate the use of DeepONet to infer flow fields around unseen airfoils with the aim of shape constrained optimization, an important design problem in aerodynamics that typically taxes computational resources heavily. We present results that display little to no degradation in prediction accuracy while reducing the online optimization cost by orders of magnitude. We consider NACA airfoils as a test case for our proposed approach, as the four-digit parameterization can easily define their shape. We successfully optimize the constrained NACA four-digit problem with respect to maximizing the lift-to-drag ratio and validate all results by comparing them to a high-order CFD solver. We find that DeepONets have a low generalization error, making them ideal for generating solutions of unseen shapes. Specifically, pressure, density, and velocity fields are accurately inferred at a fraction of a second, hence enabling the use of general objective functions beyond the maximization of the lift-to-drag ratio considered in the current work. Finally, we validate the ability of DeepONet to handle a complex 3D waverider geometry at hypersonic flight by inferring shear stress and heat flux distributions on its surface at unseen angles of attack. The main contribution of this paper is a modular integrated design framework that uses an over-parametrized neural operator as a surrogate model with good generalizability coupled seamlessly with multiple optimization solvers in a plug-and-play mode.



Streaming Factor Trajectory Learning for Temporal Tensor Decomposition
Subtitled “arxiv.org/abs/2310.17021,” S. Fang, X. Yu, S. Li, Z. Wang, R. Kirby, S. Zhe. 2023.

Practical tensor data is often along with time information. Most existing temporal decomposition approaches estimate a set of fixed factors for the objects in each tensor mode, and hence cannot capture the temporal evolution of the objects' representation. More important, we lack an effective approach to capture such evolution from streaming data, which is common in real-world applications. To address these issues, we propose Streaming Factor Trajectory Learning for temporal tensor decomposition. We use Gaussian processes (GPs) to model the trajectory of factors so as to flexibly estimate their temporal evolution. To address the computational challenges in handling streaming data, we convert the GPs into a state-space prior by constructing an equivalent stochastic differential equation (SDE). We develop an efficient online filtering algorithm to estimate a decoupled running posterior of the involved factor states upon receiving new data. The decoupled estimation enables us to conduct standard Rauch-Tung-Striebel smoothing to compute the full posterior of all the trajectories in parallel, without the need for revisiting any previous data. We have shown the advantage of SFTL in both synthetic tasks and real-world applications.



Instance-wise Linearization of Neural Network for Model Interpretation
Subtitled “arXiv:2310.16295v1,” Z. Li, S. Liu, K. Bhavya, T. Bremer, V. Pascucci. 2023.

Neural network have achieved remarkable successes in many scientific fields. However, the interpretability of the neural network model is still a major bottlenecks to deploy such technique into our daily life. The challenge can dive into the non-linear behavior of the neural network, which rises a critical question that how a model use input feature to make a decision. The classical approach to address this challenge is feature attribution, which assigns an important score to each input feature and reveal its importance of current prediction. However, current feature attribution approaches often indicate the importance of each input feature without detail of how they are actually processed by a model internally. These attribution approaches often raise a concern that whether they highlight correct features for a model prediction.

For a neural network model, the non-linear behavior is often caused by non-linear activation units of a model. However, the computation behavior of a prediction from a neural network model is locally linear, because one prediction has only one activation pattern. Base on the observation, we propose an instance-wise linearization approach to reformulates the forward computation process of a neural network prediction. This approach reformulates different layers of convolution neural networks into linear matrix multiplication. Aggregating all layers' computation, a prediction complex convolution neural network operations can be described as a linear matrix multiplication F(x)=Wx+b. This equation can not only provides a feature attribution map that highlights the important of the input features but also tells how each input feature contributes to a prediction exactly. Furthermore, we discuss the application of this technique in both supervise classification and unsupervised neural network learning parametric t-SNE dimension reduction.



Strengthening and Democratizing Artificial Intelligence Research and Development
M. Parashar, T. deBlanc-Knowles, E. Gianchandani, L.E. Parker. In Computer, Vol. 56, No. 11, IEEE, pp. 85-90. 2023.
DOI: 10.1109/MC.2023.3284568

This article summarizes the vision, roadmap, and implementation plan for a National Artificial Intelligence Research Resource that aims to provide a widely accessible cyberinfrastructure for artificial intelligence R&D, with the overarching goal of bridging the resource–access divide.



Energy Stable and Structure-Preserving Schemes for the Stochastic Galerkin Shallow Water Equations
Subtitled “arXiv:2310.06229,” D. Dai, Y. Epshteyn, A. Narayan. 2023.

The shallow water flow model is widely used to describe water flows in rivers, lakes, and coastal areas. Accounting for uncertainty in the corresponding transport-dominated non-linear PDE models presents theoretical and numerical challenges that motivate the central advances of this paper. Starting with a spatially one-dimensional hyperbolicity-preserving, positivity-preserving stochastic Galerkin formulation of the parametric/uncertain shallow water equations, we derive an entropy-entropy flux pair for the system. We exploit this entropy-entropy flux pair to construct structure-preserving second-order energy conservative, and first- and second-order energy stable finite volume schemes for the stochastic Galerkin shallow water system. The performance of the methods is illustrated on several numerical experiments.



Equation Discovery with Bayesian Spike-and-Slab Priors and Efficient Kernels
Subtitled “arXiv:2310.05387v1,” D. Long, W.W. Xing, A.S. Krishnapriyan, R.M. Kirby, S. Zhe, M.W. Mahoney. 2023.

Discovering governing equations from data is important to many scientific and engineering applications. Despite promising successes, existing methods are still challenged by data sparsity as well as noise issues, both of which are ubiquitous in practice. Moreover, state-of-the-art methods lack uncertainty quantification and/or are costly in training. To overcome these limitations, we propose a novel equation discovery method based on Kernel learning and BAyesian Spike-and-Slab priors (KBASS). We use kernel regression to estimate the target function, which is flexible, expressive, and more robust to data sparsity and noises. We combine it with a Bayesian spike-and-slab prior — an ideal Bayesian sparse distribution — for effective operator selection and uncertainty quantification. We develop an expectation propagation expectation-maximization (EP-EM) algorithm for efficient posterior inference and function estimation. To overcome the computational challenge of kernel regression, we place the function values on a mesh and induce a Kronecker product construction, and we use tensor algebra methods to enable efficient computation and optimization. We show the significant advantages of KBASS on a list of benchmark ODE and PDE discovery tasks.



HiPPIS A High-Order Positivity-Preserving Mapping Software for Structured Meshes
T. A. J. Ouermi, R. M Kirby, M. Berzins. In ACM Trans. Math. Softw, ACM, Nov, 2023.
ISSN: 0098-3500
DOI: 10.1145/3632291

Polynomial interpolation is an important component of many computational problems. In several of these computational problems, failure to preserve positivity when using polynomials to approximate or map data values between meshes can lead to negative unphysical quantities. Currently, most polynomial-based methods for enforcing positivity are based on splines and polynomial rescaling. The spline-based approaches build interpolants that are positive over the intervals in which they are defined and may require solving a minimization problem and/or system of equations. The linear polynomial rescaling methods allow for high-degree polynomials but enforce positivity only at limited locations (e.g., quadrature nodes). This work introduces open-source software (HiPPIS) for high-order data-bounded interpolation (DBI) and positivity-preserving interpolation (PPI) that addresses the limitations of both the spline and polynomial rescaling methods. HiPPIS is suitable for approximating and mapping physical quantities such as mass, density, and concentration between meshes while preserving positivity. This work provides Fortran and Matlab implementations of the DBI and PPI methods, presents an analysis of the mapping error in the context of PDEs, and uses several 1D and 2D numerical examples to demonstrate the benefits and limitations of HiPPIS.



Multi-Resolution Active Learning of Fourier Neural Operators
Subtitled “arXiv:2309.16971,” S. Li, X. Yu, W. Xing, R.M. Kirby, A. Narayan, S. Zhe. 2023.

Fourier Neural Operator (FNO) is a popular operator learning framework. It not only achieves the state-of-the-art performance in many tasks, but also is highly efficient in training and prediction. However, collecting training data for the FNO can be a costly bottleneck in practice, because it often demands expensive physical simulations. To overcome this problem, we propose Multi-Resolution Active learning of FNO (MRA-FNO), which can dynamically select the input functions and resolutions to lower the data cost as much as possible while optimizing the learning efficiency. Specifically, we propose a probabilistic multi-resolution FNO and use ensemble Monte-Carlo to develop an effective posterior inference algorithm. To conduct active learning, we maximize a utility-cost ratio as the acquisition function to acquire new examples and resolutions at each step. We use moment matching and the matrix determinant lemma to enable tractable, efficient utility computation. Furthermore, we develop a cost annealing framework to avoid over-penalizing high-resolution queries at the early stage. The over-penalization is severe when the cost difference is significant between the resolutions, which renders active learning often stuck at low-resolution queries and inferior performance. Our method overcomes this problem and applies to general multi-fidelity active learning and optimization problems. We have shown the advantage of our method in several benchmark operator learning tasks.



Toward Democratizing Access to Science Data: Introducing the National Data Platform,
M. Parashar, I. Altintas. In IEEE 19th International Conference on e-Science, IEEE, 2023.
DOI: 10.1109/e-Science58273.2023.10254930

Open and equitable access to scientific data is essential to addressing important scientific and societal grand challenges, and to research enterprise more broadly. This paper discusses the importance and urgency of open and equitable data access, and explores the barriers and challenges to such access. It then introduces the vision and architecture of the National Data Platform, a recently launched project aimed at catalyzing an open, equitable and extensible data ecosystem.



Computer Science Abstractions To Help Reason About Decentralized Stablecoin Design
B. Charoenwong, R.M. Kirby, J. Reiter. In IEEE Access, IEEE, 2023.

Computer science as a discipline is known for its penchant for using abstractions as a tool for reasoning. It is no surprise that computer science might have something valuable to lend to the world of decentralized stablecoin design, as it is in fact a “computing" problem. In this paper, we examine the possibility of a decentralized and capital-efficient stablecoin using smart contracts that algorithmically trade to maintain stability and study the potential new functionality that smart contracts enable. By exploiting traditional abstractions from computer science, we show that a capital-efficient algorithmic stablecoin cannot be provably stable. Additionally, we provide a formal exposition of the workings of Central Bank Digital Currencies, connecting this to the space of possible stablecoin designs. We then discuss several outstanding conjectures from both academics and practitioners and finally highlight the regulatory similarities between money-market funds and working stablecoins. Our work builds upon the current and growing interplay between the realms of engineering and financial services, and it also demonstrates how ways of thinking as a computer scientist can aid practitioners. We believe this research is vital for understanding and developing the future of financial technology.



Dynamic Data-Driven Application Systems for Reservoir Simulation-Based Optimization: Lessons Learned and Future Trends,
M. Parashar, T. Kurc, H. Klie, M.F. Wheeler, J.H. Saltz, M. Jammoul, R. Dong. In Handbook of Dynamic Data Driven Applications Systems: Volume 2, Springer International Publishing, pp. 287--330. 2023.
DOI: 10.1007/978-3-031-27986-7_11

Since its introduction in the early 2000s, the Dynamic Data-Driven Applications Systems (DDDAS) paradigm has served as a powerful concept for continuously improving the quality of both models and data embedded in complex dynamical systems. The DDDAS unifying concept enables capabilities to integrate multiple sources and scales of data, mathematical and statistical algorithms, advanced software infrastructures, and diverse applications into a dynamic feedback loop. DDDAS has not only motivated notable scientific and engineering advances on multiple fronts, but it has been also invigorated by the latest technological achievements in artificial intelligence, cloud computing, augmented reality, robotics, edge computing, Internet of Things (IoT), and Big Data. Capabilities to handle more data in a much faster and smarter fashion is paving the road for expanding automation capabilities. The purpose of this chapter is to review the fundamental components that have shaped reservoir-simulation-based optimization in the context of DDDAS. The foundations of each component will be systematically reviewed, followed by a discussion on current and future trends oriented to highlight the outstanding challenges and opportunities of reservoir management problems under the DDDAS paradigm. Moreover, this chapter should be viewed as providing pathways for establishing a synergy between renewable energy and oil and gas industry with the advent of the DDDAS method.



Strengthening the US Department of Energy's Recruitment Pipeline: The DOE/NNSA Predictive Science Academic Alliance Program (PSAAP) Experience
J. K. Holmen, V. G. Vergara Larrea, E. W. Draeger, E. T. Phipps, P. J. Smith, M. Berzins, S. T. Smith, J. N. Thornock, S. Parete-Koon. In Practice and Experience in Advanced Research Computing, ACM, pp. 137--144. 2023.

The US Department of Energy (DOE) oversees a system of 17 national laboratories responsible for developing unique scientific capabilities beyond the scope of academic and industrial institutions. These labs strive to keep America at the forefront of discovery and are home to some of the Nation’s best minds and the world’s best scientific and research facilities. Collaborations between national laboratories and academic institutions are critical to develop and recruit talent for the DOE workforce. Academia’s cooperative education model poses challenges for DOE recruitment pipelines centered around traditional internships. This paper discusses a promising DOE recruitment pipeline, the National Nuclear Security Administration’s (NNSA) Predictive Science Academic Alliance Program (PSAAP) initiative. As a part of this, experiences capturing the successes and challenges faced by the University of Utah’s Carbon Capture Multidisciplinary Simulation Center (CCMSC) through their participation in the PSAAP-II initiative are shared. These experiences demonstrate the success of Utah’s PSAAP center as a recruitment pipeline with approximately 43% of CCMSC students going to a national laboratory after graduation. Potential opportunities to strengthen the DOE’s recruitment pipeline are also discussed.



TEMA: Event Driven Serverless Workflows Platform for Natural Disaster Management
C. Sicari, A. Catalfamo, L. Carnevale, A. Galletta, D. Balouek-Thomert, M. Parashar, M. Villari. In 2023 IEEE Symposium on Computers and Communications (ISCC), pp. 1-6. 2023.
DOI: 10.1109/ISCC58397.2023.10217920

TEMA project is a Horizon Europe funded project that aims at addressing Natural Disaster Management by the use of sophisticated Cloud-Edge Continuum infrastructures by means of data analysis algorithms wrapped in Serverless functions deployed on a distributed infrastructure according to a Federated Learning scheduler that constantly monitors the infrastructure in search of the best way to satisfy required QoS constraints. In this paper, we discuss the advantages of Serverless workflow and how they can be used and monitored to natively trigger complex algorithm pipelines in the continuum, dynamically placing and relocating them taking into account incoming IoT data, QoS constraints, and the current status of the continuum infrastructure. Therefore we presented the Urgent Function Enabler (UFE) platform, a fully distributed architecture able to define, spread, and manage FaaS functions, using local IOT data managed using the Fiware ecosystem and a computing infrastructure composed of mobile and stable nodes.



Optimizing Data Movement for GPU-Based In-Situ Workflow Using GPUDirect RDMA,
B. Zhang, P.E. Davis, N. Morales, Z. Zhang, K. Teranishi, M. Parashar. In Euro-Par 2023: Parallel Processing, Springer Nature Switzerland, pp. 323--338. 2023.
ISBN: 978-3-031-39698-4
DOI: 10.1007/978-3-031-39698-4_22

The extreme-scale computing landscape is increasingly dominated by GPU-accelerated systems. At the same time, in-situ workflows that employ memory-to-memory inter-application data exchanges have emerged as an effective approach for leveraging these extreme-scale systems. In the case of GPUs, GPUDirect RDMA enables third-party devices, such as network interface cards, to access GPU memory directly and has been adopted for intra-application communications across GPUs. In this paper, we present an interoperable framework for GPU-based in-situ workflows that optimizes data movement using GPUDirect RDMA. Specifically, we analyze the characteristics of the possible data movement pathways between GPUs from an in-situ workflow perspective, and design a strategy that maximizes throughput. Furthermore, we implement this approach as an extension of the DataSpaces data staging service, and experimentally evaluate its performance and scalability on a current leadership GPU cluster. The performance results show that the proposed design reduces data-movement time by up to 53% and 40% for the sender and receiver, respectively, and maintains excellent scalability for up to 256 GPUs.