In-situ and in-transit processing alleviate the gap between the computing and I/O capabilities by scheduling data analytics close to the data source. Hybrid in-situ processing splits data analytics into two stages: the data processing that runs in-situ aims to extract regions of interest, which are then transferred to staging services for further in-transit analytics. To facilitate this type of hybrid in-situ processing, the data staging service needs to support complex intermediate data representations generated/consumed by the in-situ tasks. Unstructured (or irregular) mesh is one such derived data representation that is typically used and bridges simulation data and analytics. However, how staging services efficiently support unstructured mesh transfer and processing remains to be explored. This paper investigates design options for transferring and processing unstructured mesh data using staging services. Using polygonal mesh data as an example, we show that hybrid in-situ workflows with staging-based unstructured mesh processing can effectively support hybrid in-situ workflows, and can significantly decrease data movement overheads.
In-situ processing alleviates the gap between computation and I/O capabilities by performing data analysis close to the data source. With simulation data varying in size and content during workflow execution, it becomes necessary for insitu processing to support resource elasticity, i.e., the ability to change resource configurations such as the number of computing nodes/processes during workflow execution. An elastic job may dynamically adjust resource configurations; it may use a few resources at the beginning and more resources towards the end of the job when interesting data appears. However, it is hard to predict a priori how many computing nodes/processes need to be added/removed during the workflow execution to adapt to changing workflow needs. How to efficiently guide elasticity operations, such as growing or shrinking the number of processes used for in-situ analysis during workflow execution, is an open-ended research question. In this paper, we present an adaptive elasticity policy that adopts workflow runtime information collected online to predict how to trigger the addition and removal of processes in order to minimize in-situ processing overheads. We integrate the presented elasticity policy into a staging-based elastic workflow and evaluate its efficiency in multiple elasticity scenarios. The results indicate that an adaptive elasticity policy can save overhead in finding a proper resource configuration, when compared with a static policy that uses a fixed number of processes for each rescaling operation. Finally, we discuss multiple existing research opportunities of elastic insitu processing from different aspects.
In-situ processing addresses the gap between speeds of computing and I/O capabilities by processing data close to the data source, i.e., on the same system as the data source (e.g., a simulation). However, the effective implementation of in-situ processing workflows requires the optimization of several design parameters such as where on the system workflow data analysis/visualization (ana/vis) as placed and how execution as well as the interaction and data exchanges between ana/vis are coordinated. For example, in the case of hybrid in-situ processing, interacting ana/vis may be tightly or loosely coupled depending on their placement, and this can lead to very different performance and scalability. A key challenge is deciding the most appropriate ana/vis placement, which depends on dynamic applications, workflow, and system characteristics that might change at runtime. In this paper, we present a framework to support online adaptive data analysis placement during the execution of an in-situ workflow. Specifically, the paper presents a model and architecture, and explores several data analysis placement strategies. Evaluation results show that dynamically choosing appropriate data analysis placement strategies can balance the benefits and overhead of different data analysis placement patterns to reduce in-situ processing time.
W. W. Xing, A. A. Shah, P. Wang, S. Zhe, Q. Fu, R. M. Kirby. Residual Gaussian process: A tractable nonparametric Bayesian emulator for multi-fidelity simulations, In Applied Mathematical Modelling, Vol. 97, Elsevier, pp. 36-56. 2021.
Challenges in multi-fidelity modelling relate to accuracy, uncertainty estimation and high-dimensionality. A novel additive structure is introduced in which the highest fidelity solution is written as a sum of the lowest fidelity solution and residuals between the solutions at successive fidelity levels, with Gaussian process priors placed over the low fidelity solution and each of the residuals. The resulting model is equipped with a closed-form solution for the predictive posterior, making it applicable to advanced, high-dimensional tasks that require uncertainty estimation. Its advantages are demonstrated on univariate benchmarks and on three challenging multivariate problems. It is shown how active learning can be used to enhance the model, especially with a limited computational budget. Furthermore, error bounds are derived for the mean prediction in the univariate case.
W. W. Xing, R. M. Kirby, S. Zhe. Deep coregionalization for the emulation of simulation-based spatial-temporal fields, In Journal of Computational Physics, Academic Press, pp. 109984. 2021.
Data-driven surrogate models are widely used for applications such as design optimization and uncertainty quantification, where repeated evaluations of an expensive simulator are required. For most partial differential equation (PDE) simulations, the outputs of interest are often spatial or spatial-temporal fields, leading to very high-dimensional outputs. Despite the success of existing data-driven surrogates for high-dimensional outputs, most methods require a significant number of samples to cover the response surface in order to achieve a reasonable degree of accuracy. This demand makes the idea of surrogate models less attractive considering the high-computational cost to generate the data. To address this issue, we exploit the multifidelity nature of a PDE simulation and introduce deep coregionalization, a Bayesian nonparametric autoregressive framework for efficient emulation of spatial-temporal fields. To effectively extract the output correlations in the context of multifidelity data, we develop a novel dimension reduction technique, residual principal component analysis. Our model can simultaneously capture the rich output correlations and the fidelity correlations and make high-fidelity predictions with only a small number of expensive, high-fidelity simulation samples. We show the advantages of our model in three canonical PDE models and a fluid dynamics problem. The results show that the proposed method can not only approximate simulation results with significantly less cost (by bout 10%-25%) but also further improve model accuracy.
Y. Xu, V. Keshavarzzadeh, R. M. Kirby, A. Narayan. A bandit-learning approach to multifidelity approximation, Subtitled arXiv preprint arXiv:2103.15342, 2021.
Multifidelity approximation is an important technique in scientific computation and simulation. In this paper, we introduce a bandit-learning approach for leveraging data of varying fidelities to achieve precise estimates of the parameters of interest. Under a linear model assumption, we formulate a multifidelity approximation as a modified stochastic bandit, and analyze the loss for a class of policies that uniformly explore each model before exploiting. Utilizing the estimated conditional mean-squared error, we propose a consistent algorithm, adaptive Explore-Then-Commit (AETC), and establish a corresponding trajectory-wise optimality result. These results are then extended to the case of vector-valued responses, where we demonstrate that the algorithm is efficient without the need to worry about estimating high-dimensional parameters. The main advantage of our approach is that we require neither hierarchical model structure nor\textit a priori knowledge of statistical information (eg, correlations) about or between models. Instead, the AETC algorithm requires only knowledge of which model is a trusted high-fidelity model, along with (relative) computational cost estimates of querying each model. Numerical experiments are provided at the end to support our theoretical findings.
Y. Xu, A. Narayan. Randomized weakly admissible meshes, Subtitled arXiv preprint arXiv:2101.04043, 2021.
A weakly admissible mesh (WAM) on a continuum real-valued domain is a sequence of discrete grids such that the discrete maximum norm of polynomials on the grid is comparable to the supremum norm of polynomials on the domain. The asymptotic rate of growth of the grid sizes and of the comparability constant must grow in a controlled manner. In this paper we generalize the notion of a WAM to a hierarchical subspaces of not necessarily polynomial functions, and we analyze particular strategies for random sampling as a technique for generating WAMs. Our main results show that WAM's and their stronger variant, admissible meshes, can be generated by random sampling, and our analysis provides concrete estimates for growth of both the meshes and the discrete-continuum comparability constants.
Y. Xu, A. Narayan. Budget-limited distribution learning in multifidelity problems, Subtitled arXiv preprint arXiv:2105.04599, 2021.
Multifidelity methods are widely used for statistical estimation of quantities of interest (QoIs) in uncertainty quantification using simulation codes of differing costs and accuracies. Many methods approximate numerical-valued statistics that represent only limited information of the QoIs. In this paper, we introduce a semi-parametric approach that aims to effectively describe the distribution of a scalar-valued QoI in the multifidelity setup. Under a linear model hypothesis, we propose an exploration-exploitation strategy to reconstruct the full distribution of a scalar-valued QoI using samples from a subset of low-fidelity regressors. We derive an informative asymptotic bound for the mean 1-Wasserstein distance between the estimator and the true distribution, and use it to adaptively allocate computational budget for parametric estimation and non-parametric reconstruction. Assuming the linear model is correct, we prove that such a procedure is consistent, and converges to the optimal policy (and hence optimal computational budget allocation) under an upper bound criterion as the budget goes to infinity. A major advantage of our approach compared to several other multifidelity methods is that it is automatic, and its implementation does not require a hierarchical model setup, cross-model information, or \textita priori known model statistics. Numerical experiments are provided in the end to support our theoretical analysis.
V. Zala, R. M. Kirby, A. Narayan. Structure-preserving Nonlinear Filtering for Continuous and Discontinuous Galerkin Spectral/hp Element Methods, Subtitled arXiv preprint arXiv:2106.08316, 2021.
Finite element simulations have been used to solve various partial differential equations (PDEs) that model physical, chemical, and biological phenomena. The resulting discretized solutions to PDEs often do not satisfy requisite physical properties, such as positivity or monotonicity. Such invalid solutions pose both modeling challenges, since the physical interpretation of simulation results is not possible, and computational challenges, since such properties may be required to advance the scheme. We, therefore, consider the problem of computing solutions that preserve these structural solution properties, which we enforce as additional constraints on the solution. We consider in particular the class of convex constraints, which includes positivity and monotonicity. By embedding such constraints as a postprocessing convex optimization procedure, we can compute solutions that satisfy general types of convex constraints. For certain types of constraints (including positivity and monotonicity), the optimization is a filter, i.e., a norm-decreasing operation. We provide a variety of tests on one-dimensional time-dependent PDEs that demonstrate the method's efficacy, and we empirically show that rates of convergence are unaffected by the inclusion of the constraints.
R. Zambre, D. Sahasrabudhe, H. Zhou, M. Berzins, A. Chandramowlishwaran, P. Balaji. Logically Parallel Communication for Fast MPI+Threads Communication, In Proceedings of the Transactions on Parallel and Distributed Computing, IEEE, April, 2021.
Supercomputing applications are increasingly adopting the MPI+threads programming model over the traditional “MPI everywhere” approach to better handle the disproportionate increase in the number of cores compared with other on-node resources. In practice, however, most applications observe a slower performance with MPI+threads primarily because of poor communication performance. Recent research efforts on MPI libraries address this bottleneck by mapping logically parallel communication, that is, operations that are not subject to MPI’s ordering constraints to the underlying network parallelism. Domain scientists, however, typically do not expose such communication independence information because the existing MPI-3.1 standard’s semantics can be limiting. Researchers had initially proposed user-visible endpoints to combat this issue, but such a solution requires intrusive changes to the standard (new APIs). The upcoming MPI-4.0 standard, on the other hand, allows applications to relax unneeded semantics and provides them with many opportunities to express logical communication parallelism. In this paper, we show how MPI+threads applications can achieve high performance with logically parallel communication. Through application case studies, we compare the capabilities of the new MPI-4.0 standard with those of the existing one and user-visible endpoints (upper bound). Logical communication parallelism can boost the overall performance of an application by over 2x.
B. Zenger, W. W. Good, J. A. Bergquist, L. C. Rupp, M. Perez, G. J. Stoddard, V. Sharma, R. S. MacLeod. Transient recovery of epicardial and torso ST-segment ischemic signals during cardiac stress tests: A possible physiological mechanism, In Journal of Electrocardiology, Churchill Livingstone, 2021.
Acute myocardial ischemia has several characteristic ECG findings, including clinically detectable ST-segment deviations. However, the sensitivity and specificity of diagnosis based on ST-segment changes are low. Furthermore, ST-segment deviations have been shown to be transient and spontaneously recover without any indication the ischemic event has subsided.
Assess the transient recovery of ST-segment deviations on remote recording electrodes during a partial occlusion cardiac stress test and compare them to intramyocardial ST-segment deviations.
We used a previously validated porcineBZ experimental model of acute myocardial ischemia with controllable ischemic load and simultaneous electrical measurements within the heart wall, on the epicardial surface, and on the torso surface. Simulated cardiac stress tests were induced by occluding a coronary artery while simultaneously pacing rapidly or infusing dobutamine to stimulate cardiac function. Postexperimental imaging created anatomical models for data visualization and quantification. Markers of ischemia were identified as deviations in the potentials measured at 40% of the ST-segment. Intramural cardiac conduction speed was also determined using the inverse gradient method. We assessed changes in intramyocardial ischemic volume proportion, conduction speed, clinical presence of ischemia on remote recording arrays, and regional changes to intramyocardial ischemia. We defined the peak deviation response time as the time interval after onset of ischemia at which maximum ST-segment deviation was achieved, and ST-recovery time was the interval when ST deviation returned to below thresholded of ST elevation.
In both epicardial and torso recordings, the peak ST-segment deviation response time was 4.9±1.1 min and the ST-recovery time was approximately 7.9±2.5 min, both well before the termination of the ischemic stress. At peak response time, conduction speed was reduced by 50% and returned to near baseline at ST-recovery. The overall ischemic volume proportion initially increased, on average, to 37% at peak response time; however, it recovered only to 30% at the ST-recovery time. By contrast, the subepicardial region of the myocardial wall showed 40% ischemic volume at peak response time and recovered much more strongly to 25% as epicardial ST-segment deviations returned to baseline.
Our data show that remote ischemic signal recovery correlates with a recovery of the subepicardial myocardium, while subendocardial ischemic development persists.
We propose a data-driven space-filling curve method for 2D and 3D visualization. Our flexible curve traverses the data elements in the spatial domain in a way that the resulting linearization better preserves features in space compared to existing methods. We achieve such data coherency by calculating a Hamiltonian path that approximately minimizes an objective function that describes the similarity of data values and location coherency in a neighborhood. Our extended variant even supports multiscale data via quadtrees and octrees. Our method is useful in many areas of visualization, including multivariate or comparative visualization,ensemble visualization of 2D and 3D data on regular grids, or multiscale visual analysis of particle simulations. The effectiveness of our method is evaluated with numerical comparisons to existing techniques and through examples of ensemble and multivariate datasets.
Y. Zhou, N. Chalapathi, A. Rathore, Y. Zhao, Bei Wang. Mapper Interactive: A Scalable, Extendable, and Interactive Toolbox for the Visual Exploration of High-Dimensional Data., In IEEE Pacific Visualization Symposium, 2021.
The mapper algorithm is a popular tool from topological data analysis for extracting topological summaries of high-dimensional datasets. In this paper, we present Mapper Interactive, a web-based framework for the interactive analysis and visualization of high-dimensional point cloud data. It implements the mapper algorithm in an interactive, scalable, and easily extendable way, thus supporting practical data analysis. In particular, its command-line API can compute mapper graphs for 1 million points of 256 dimensions in about 3 minutes (4 times faster than the vanilla implementation). Its visual interface allows on-the-fly computation and manipulation of the mapper graph based on user-specified parameters and supports the addition of new analysis modules with a few lines of code. Mapper Interactive makes the mapper algorithm accessible to nonspecialists and accelerates topological analytics workflows.
Morse complexes are gradient-based topological descriptors with close connections to Morse theory. They are widely applicable in scientific visualization as they serve as important abstractions for gaining insights into the topology of scalar fields. Noise inherent to scalar field data due to acquisitions and processing, however, limits our understanding of the Morse complexes as structural abstractions. We, therefore, explore uncertainty visualization of an ensemble of 2D Morse complexes that arise from scalar fields coupled with data uncertainty. We propose statistical summary maps as new entities for capturing structural variations and visualizing positional uncertainties of Morse complexes in ensembles. Specifically, we introduce two types of statistical summary maps -- the Probabilistic Map and the Survival Map -- to characterize the uncertain behaviors of local extrema and local gradient flows, respectively. We demonstrate the utility of our proposed approach using synthetic and real-world datasets.
The term “in situ processing” has evolved over the last decade to mean both a specific strategy for visualizing and analyzing data and an umbrella term for a processing paradigm. The resulting confusion makes it difficult for visualization and analysis scientists to communicate with each other and with their stakeholders. To address this problem, a group of over fifty experts convened with the goal of standardizing terminology. This paper summarizes their findings and proposes a new terminology for describing in situ systems. An important finding from this group was that in situ systems are best described via multiple, distinct axes: integration type, proximity, access, division of execution, operation controls, and output type. This paper discusses these axes, evaluates existing systems within the axes, and explores how currently used terms relate to the axes.
L. Cinquini, S. Petruzza, Jason J. Boutte, S. Ames, G. Abdulla, V. Balaji, R. Ferraro, A. Radhakrishnan, L. Carriere, T. Maxwell, G. Scorzelli, V. Pascucci. Distributed Resources for the Earth System Grid Advanced Management (DREAM), Final Report, 2020.
The DREAM project was funded more than 3 years ago to design and implement a next-generation ESGF (Earth System Grid Federation ) architecture which would be suitable for managing and accessing data and services resources on a distributed and scalable environment. In particular, the project intended to focus on the computing and visualization capabilities of the stack, which at the time were rather primitive. At the beginning, the team had the general notion that a better ESGF architecture could be built by modularizing each component, and redefining its interaction with other components by defining and exposing a well defined API. Although this was still the high level principle that guided the work, the DREAM project was able to accomplish its goals by leveraging new practices in IT that started just about 3 or 4 years ago: the advent of containerization technologies (specifically, Docker), the development of frameworks to manage containers at scale (Docker Swarm and Kubernetes), and their application to the commercial Cloud. Thanks to these new technologies, DREAM was able to improve the ESGF architecture (including its computing and visualization services) to a level of deployability and scalability beyond the original expectations.
We present dw2, a flexible and easy-to-use software infrastructure for interactive rendering of large tiled display walls. Our library represents the tiled display wall as a single virtual screen through a display "service", which renderers connect to and send image tiles to be displayed, either from an on-site or remote cluster. The display service can be easily configured to support a range of typical network and display hardware configurations; the client library provides a straightforward interface for easy integration into existing renderers. We evaluate the performance of our display wall service in different configurations using a CPU and GPU ray tracer, in both on-site and remote rendering scenarios using multiple display walls.
Objective: Clinical outcomes from deep brain stimulation (DBS) can be highly variable, and two critical factors underlying this variability are the location and type of stimulation. In this study we quantified how robustly DBS activates a target region when taking into account a range of different lead designs and realistic variations in placement. The objective of the study is to assess the likelihood of achieving target activation.
Approach: We performed finite element computational modeling and established a metric of performance robustness to evaluate the ability of directional and multi-lead configurations to activate target fiber pathways while taking into account location variability. A more robust lead configuration produces less variability in activation across all stimulation locations around the target.
Main results: Directional leads demonstrated higher overall performance robustness compared to axisymmetric leads, primarily 1-2 mm outside of the target. Multi-lead configurations demonstrated higher levels of robustness compared to any single lead due to distribution of electrodes in a broader region around the target.
Significance: Robustness measures can be used to evaluate the performance of existing DBS lead designs and aid in the development of novel lead designs to better accommodate known variability in lead location and orientation. This type of analysis may also be useful to understand how DBS clinical outcome variability is influenced by lead location among groups of patients.
K. A. Johnson, G. Duffley, D. Nesterovich Anderson, J. L. Ostrem, M. Welter, J. C. Baldermann, J. Kuhn, D. Huys, V. Visser-Vandewalle, T. Foltynie, L. Zrinzo, M. Hariz, A. F. G. Leentjens, A. Y. Mogilner, M. H. Pourfar, L. Almeida, A. Gunduz, K. D. Foote, M. S. Okun, C. R. Butson.
Structural connectivity predicts clinical outcomes of deep brain stimulation for Tourette syndrome, In Brain, July, 2020.
Deep brain stimulation may be an effective therapy for select cases of severe, treatment-refractory Tourette syndrome; however, patient responses are variable, and there are no reliable methods to predict clinical outcomes. The objectives of this retrospective study were to identify the stimulation-dependent structural networks associated with improvements in tics and comorbid obsessive-compulsive behaviour, compare the networks across surgical targets, and determine if connectivity could be used to predict clinical outcomes. Volumes of tissue activated for a large multisite cohort of patients (n = 66) implanted bilaterally in globus pallidus internus (n = 34) or centromedial thalamus (n = 32) were used to generate probabilistic tractography to form a normative structural connectome. The tractography maps were used to identify networks that were correlated with improvement in tics or comorbid obsessive-compulsive behaviour and to predict clinical outcomes across the cohort. The correlated networks were then used to generate ‘reverse’ tractography to parcellate the total volume of stimulation across all patients to identify local regions to target or avoid. The results showed that for globus pallidus internus, connectivity to limbic networks, associative networks, caudate, thalamus, and cerebellum was positively correlated with improvement in tics; the model predicted clinical improvement scores (P = 0.003) and was robust to cross-validation. Regions near the anteromedial pallidum exhibited higher connectivity to the positively correlated networks than posteroventral pallidum, and volume of tissue activated overlap with this map was significantly correlated with tic improvement (P < 0.017). For centromedial thalamus, connectivity to sensorimotor networks, parietal-temporal-occipital networks, putamen, and cerebellum was positively correlated with tic improvement; the model predicted clinical improvement scores (P = 0.012) and was robust to cross-validation. Regions in the anterior/lateral centromedial thalamus exhibited higher connectivity to the positively correlated networks, but volume of tissue activated overlap with this map did not predict improvement (P > 0.23). For obsessive-compulsive behaviour, both targets showed that connectivity to the prefrontal cortex, orbitofrontal cortex, and cingulate cortex was positively correlated with improvement; however, only the centromedial thalamus maps predicted clinical outcomes across the cohort (P = 0.034), but the model was not robust to cross-validation. Collectively, the results demonstrate that the structural connectivity of the site of stimulation are likely important for mediating symptom improvement, and the networks involved in tic improvement may differ across surgical targets. These networks provide important insight on potential mechanisms and could be used to guide lead placement and stimulation parameter selection, as well as refine targets for neuromodulation therapies for Tourette syndrome.