S. Sane, T. Athawale,, C.R. Johnson. Visualization of Uncertain Multivariate Data via Feature Confidence Level-Sets, In EuroVis 2021, 2021.
Recent advancements in multivariate data visualization have opened new research opportunities for the visualization community. In this paper, we propose an uncertain multivariate data visualization technique called feature confidence level-sets. Conceptually, feature level-sets refer to level-sets of multivariate data. Our proposed technique extends the existing idea of univariate confidence isosurfaces to multivariate feature level-sets. Feature confidence level-sets are computed by considering the trait for a specific feature, a confidence interval, and the distribution of data at each grid point in the domain. Using uncertain multivariate data sets, we demonstrate the utility of the technique to visualize regions with uncertainty in relation to the specific trait or feature, and the ability of the technique to provide secondary feature structure visualization based on uncertainty.
S. Sane, A. Yenpure, R. Bujack, M. Larsen, K. Moreland, C. Garth, C. R. Johnson,, H. Childs.
Scalable In Situ Computation of Lagrangian Representations via Local Flow Maps, In Eurographics Symposium on Parallel Graphics and Visualization, The Eurographics Association, 2021.
In situ computation of Lagrangian flow maps to enable post hoc time-varying vector field analysis has recently become an active area of research. However, the current literature is largely limited to theoretical settings and lacks a solution to address scalability of the technique in distributed memory. To improve scalability, we propose and evaluate the benefits and limitations of a simple, yet novel, performance optimization. Our proposed optimization is a communication-free model resulting in local Lagrangian flow maps, requiring no message passing or synchronization between processes, intrinsically improving scalability, and thereby reducing overall execution time and alleviating the encumbrance placed on simulation codes from communication overheads. To evaluate our approach, we computed Lagrangian flow maps for four time-varying simulation vector fields and investigated how execution time and reconstruction accuracy are impacted by the number of GPUs per compute node, the total number of compute nodes, particles per rank, and storage intervals. Our study consisted of experiments computing Lagrangian flow maps with up to 67M particle trajectories over 500 cycles and used as many as 2048 GPUs across 512 compute nodes. In all, our study contributes an evaluation of a communication-free model as well as a scalability study of computing distributed Lagrangian flow maps at scale using in situ infrastructure on a modern supercomputer.
Investigating In Situ Reduction via Lagrangian Representations for Cosmology and Seismology Applications, In Computational Science -- ICCS 2021, Springer International Publishing, pp. 436--450. 2021.
Although many types of computational simulations produce time-varying vector fields, subsequent analysis is often limited to single time slices due to excessive costs. Fortunately, a new approach using a Lagrangian representation can enable time-varying vector field analysis while mitigating these costs. With this approach, a Lagrangian representation is calculated while the simulation code is running, and the result is explored after the simulation. Importantly, the effectiveness of this approach varies based on the nature of the vector field, requiring in-depth investigation for each application area. With this study, we evaluate the effectiveness for previously unexplored cosmology and seismology applications. We do this by considering encumbrance (on the simulation) and accuracy (of the reconstructed result). To inform encumbrance, we integrated in situ infrastructure with two simulation codes, and evaluated on representative HPC environments, performing Lagrangian in situ reduction using GPUs as well as CPUs. To inform accuracy, our study conducted a statistical analysis across a range of spatiotemporal configurations as well as a qualitative evaluation. In all, we demonstrate effectiveness for both cosmology and seismology—time-varying vector fields from these domains can be reduced to less than 1% of the total data via Lagrangian representations, while maintaining accurate reconstruction and requiring under 10% of total execution time in over 80% of our experiments.
A. Singh, M. Bauer, S. Joshi. Physics Informed Convex Artificial Neural Networks (PICANNs) for Optimal Transport based Density Estimation, Subtitled arXiv, 2021.
Optimal Mass Transport (OMT) is a well studied problem with a variety of applications in a diverse set of fields ranging from Physics to Computer Vision and in particular Statistics and Data Science. Since the original formulation of Monge in 1781 significant theoretical progress been made on the existence, uniqueness and properties of the optimal transport maps. The actual numerical computation of the transport maps, particularly in high dimensions, remains a challenging problem. By Brenier's theorem, the continuous OMT problem can be reduced to that of solving a non-linear PDE of Monge-Ampere type whose solution is a convex function. In this paper, building on recent developments of input convex neural networks and physics informed neural networks for solving PDE's, we propose a Deep Learning approach to solve the continuous OMT problem.
To demonstrate the versatility of our framework we focus on the ubiquitous density estimation and generative modeling tasks in statistics and machine learning. Finally as an example we show how our framework can be incorporated with an autoencoder to estimate an effective probabilistic generative model.
RISE: Reducing I/O Contention in Staging-based Extreme-Scale In-situ Workflows, In 2021 IEEE International Conference on Cluster Computing (CLUSTER), pp. 146--156. 2021.P. Subedi, P.E .Davis, M. Parashar.
While in-situ workflow formulations have addressed some of the data-related challenges associated with extreme-scale scientific workflows, these workflows involve complex interactions and different modes of data exchange. In the context of increasing system complexity, such workflows present significant resource management challenges, requiring complex cost-performance tradeoffs. This paper presents RISE, an intelligent staging-based data management middleware, which builds on the DataSpaces framework and performs intelligent scheduling of data management operations to reduce I/O contention. In RISE, data are always written immediately to local buffers to reduce the effect of the transfer impact upon application performance. RISE identifies applications’ data access patterns and moves data towards data consumers only when the network is expected to be idle, reducing the impact of asynchronous …
E. Suchyta, S. Klasky, N. Podhorszki, M. Wolf, A. Adesoji, C.S. Chang, J. Choi, P. E. Davis, J. Dominski, S. Ethier, I. Foster, K. Germaschewski, B. Geveci, C. Harris, K. A. Huck, Q. Liu, J. Logan, K. Mehta, G. Merlo, S. V. Moore, T. Munson, M. Parashar, D. Pugmire, M. S. Shephard, C. W. Smith, P. Subedi, L. Wan, R. Wang, S. Zhang. The Exascale Framework for High Fidelity coupled Simulations (EFFIS): Enabling whole device modeling in fusion science, In The International Journal of High Performance Computing Applications, SAGE Publications, pp. 10943420211019119. 2021.
We present the Exascale Framework for High Fidelity coupled Simulations (EFFIS), a workflow and code coupling framework developed as part of the Whole Device Modeling Application (WDMApp) in the Exascale Computing Project.EFFIS consists of a library, command line utilities, and a collection of run-time daemons. Together, these software products enable users to easily compose and execute workflows that include: strong or weak coupling, in situ (or offline)analysis/visualization/monitoring, command-and-control actions, remote dashboard integration, and more. We describe WDMApp physics coupling cases and computer science requirements that motivate the design of the EFFIS framework. Furthermore, we explain the essential enabling technology that EFFIS leverages: ADIOS for performant data movement, PerfStubs/TAU for performance monitoring, and an advanced COUPLER for transforming coupling data from its native format to the representation needed by another application. Finally, we demonstrate EFFIS using coupled multi-simulation WDMApp workflows and exemplify how the framework supports the project’s needs. We show that EFFIS and its associated services for data movement, visualization, and performance collection does not introduce appreciable overhead to the WDMApp workflow and that the resource-dominant application’s idle time while waiting for data is minimal.
T. Sun, D. Li, B. Wang. Decentralized Federated Averaging, Subtitled arXiv preprint arXiv:2104.11375, 2021.
Federated averaging (FedAvg) is a communication efficient algorithm for the distributed training with an enormous number of clients. In FedAvg, clients keep their data locally for privacy protection; a central parameter server is used to communicate between clients. This central server distributes the parameters to each client and collects the updated parameters from clients. FedAvg is mostly studied in centralized fashions, which requires massive communication between server and clients in each communication. Moreover, attacking the central server can break the whole system's privacy. In this paper, we study the decentralized FedAvg with momentum (DFedAvgM), which is implemented on clients that are connected by an undirected graph. In DFedAvgM, all clients perform stochastic gradient descent with momentum and communicate with their neighbors only. To further reduce the communication cost, we also consider the quantized DFedAvgM. We prove convergence of the (quantized) DFedAvgM under trivial assumptions; the convergence rate can be improved when the loss function satisfies the P\L property. Finally, we numerically verify the efficacy of DFedAvgM.
T. Sun, D. Li, B. Wang. Stability and Generalization of the Decentralized Stochastic Gradient Descent, Subtitled arXiv preprint arXiv:2102.01302, 2021.
The stability and generalization of stochastic gradient-based methods provide valuable insights into understanding the algorithmic performance of machine learning models. As the main workhorse for deep learning, stochastic gradient descent has received a considerable amount of studies. Nevertheless, the community paid little attention to its decentralized variants. In this paper, we provide a novel formulation of the decentralized stochastic gradient descent. Leveraging this formulation together with (non) convex optimization theory, we establish the first stability and generalization guarantees for the decentralized stochastic gradient descent. Our theoretical results are built on top of a few common and mild assumptions and reveal that the decentralization deteriorates the stability of SGD for the first time. We verify our theoretical findings by using a variety of decentralized settings and benchmark machine learning models.
A Gaussian Process Model for Unsupervised Analysis of High Dimensional Shape Data, In Machine Learning in Medical Imaging, Springer International Publishing, pp. 356--365. 2021.
Applications of medical image analysis are often faced with the challenge of modelling high-dimensional data with relatively few samples. In many settings, normal or healthy samples are prevalent while pathological samples are rarer, highly diverse, and/or difficult to model. In such cases, a robust model of the normal population in the high-dimensional space can be useful for characterizing pathologies. In this context, there is utility in hybrid models, such as probabilistic PCA, which learns a low-dimensional model, commensurates with the available data, and combines it with a generic, isotropic noise model for the remaining dimensions. However, the isotropic noise model ignores the inherent correlations that are evident in so many high-dimensional data sets associated with images and shapes in medicine. This paper describes a method for estimating a Gaussian model for collections of images or shapes that exhibit underlying correlations, e.g., in the form of smoothness. The proposed method incorporates a Gaussian-process noise model within a generative formulation. For optimization, we derive a novel expectation maximization (EM) algorithm. We demonstrate the efficacy of the method on synthetic examples and on anatomical shape data.
Uncertainty Quantification of the Effects of Segmentation Variability in ECGI, In Functional Imaging and Modeling of the Heart, Springer International Publishing, pp. 515--522. 2021.
Despite advances in many of the techniques used in Electrocardiographic Imaging (ECGI), uncertainty remains insufficiently quantified for many aspects of the pipeline. The effect of geometric uncertainty, particularly due to segmentation variability, may be the least explored to date. We use statistical shape modeling and uncertainty quantification (UQ) to compute the effect of segmentation variability on ECGI solutions. The shape model was made with Shapeworks from nine segmentations of the same patient and incorporated into an ECGI pipeline. We computed uncertainty of the pericardial potentials and local activation times (LATs) using polynomial chaos expansion (PCE) implemented in UncertainSCI. Uncertainty in pericardial potentials from segmentation variation mirrored areas of high variability in the shape model, near the base of the heart and the right ventricular outflow tract, and that ECGI was less sensitive to uncertainty in the posterior region of the heart. Subsequently LAT calculations could vary dramatically due to segmentation variability, with a standard deviation as high as 126ms, yet mainly in regions with low conduction velocity. Our shape modeling and UQ pipeline presented possible uncertainty in ECGI due to segmentation variability and can be used by researchers to reduce said uncertainty or mitigate its effects. The demonstrated use of statistical shape modeling and UQ can also be extended to other types of modeling pipelines.
J. Tate, S. Rampersad, C. Charlebois, Z. Liu, J. Bergquist, D. White, L. Rupp, D. Brooks, A. Narayan, R. MacLeod. Uncertainty Quantification in Brain Stimulation using UncertainSCI, In Brain Stimulation: Basic, Translational, and Clinical Research in Neuromodulation, Vol. 14, No. 6, Elsevier, pp. 1659-1660. 2021.
Predicting the effects of brain stimulation with computer models presents many challenges, including estimating the possible error from the propagation of uncertain input parameters through the model. Quantification and control of these errors through uncertainty quantification (UQ) provide statistics on the likely impact of parameter variation on solution accuracy, including total variance and sensitivity associated to each parameter. While the need and importance of UQ in clinical modeling is generally accepted, tools for implementing UQ techniques remain limited or inaccessible for many researchers.
M. Thorpe, B. Wang. Robust Certification for Laplace Learning on Geometric Graphs, Subtitled arXiv preprint arXiv:2104.10837, 2021.
Graph Laplacian (GL)-based semi-supervised learning is one of the most used approaches for classifying nodes in a graph. Understanding and certifying the adversarial robustness of machine learning (ML) algorithms has attracted large amounts of attention from different research communities due to its crucial importance in many security-critical applied domains. There is great interest in the theoretical certification of adversarial robustness for popular ML algorithms. In this paper, we provide the first adversarial robust certification for the GL classifier. More precisely we quantitatively bound the difference in the classification accuracy of the GL classifier before and after an adversarial attack. Numerically, we validate our theoretical certification results and show that leveraging existing adversarial defenses for the -nearest neighbor classifier can remarkably improve the robustness of the GL classifier.
J. P. Torres, Z. Lin, M. Watkins, P. F. Salcedo, R. P. Baskin, S. Elhabian, H. Safavi-Hemami, D. Taylor, J. Tun, G. P. Concepcion, N. Saguil, A. A. Yanagihara, Y. Fang, J. R. McArthur, H. Tae, R. K. Finol-Urdaneta, B. D. Özpolat, B. M. Olivera, E. W. Schmidt. Small-molecule mimicry hunting strategy in the imperial cone snail, Conus imperialis, In Science Advances, Vol. 7, No. 11, American Association for the Advancement of Science, 2021.
Venomous animals hunt using bioactive peptides, but relatively little is known about venom small molecules and the resulting complex hunting behaviors. Here, we explored the specialized metabolites from the venom of the worm-hunting cone snail, Conus imperialis. Using the model polychaete worm Platynereis dumerilii, we demonstrate that C. imperialis venom contains small molecules that mimic natural polychaete mating pheromones, evoking the mating phenotype in worms. The specialized metabolites from different cone snails are species-specific and structurally diverse, suggesting that the cones may adopt many different prey-hunting strategies enabled by small molecules. Predators sometimes attract prey using the prey’s own pheromones, in a strategy known as aggressive mimicry. Instead, C. imperialis uses metabolically stable mimics of those pheromones, indicating that, in biological mimicry, even the molecules themselves may be disguised, providing a twist on fake news in chemical ecology.
N. Truong, C. Yuksel, C. Watcharopas, J. A. Levine, R. M. Kirby. Particle Merging-and-Splitting, In IEEE Transactions on Visualization and Computer Graphics, IEEE, 2021.
Robustly handling collisions between individual particles in a large particle-based simulation has been a challenging problem. We introduce particle merging-and-splitting, a simple scheme for robustly handling collisions between particles that prevents inter-penetrations of separate objects without introducing numerical instabilities. This scheme merges colliding particles at the beginning of the time-step and then splits them at the end of the time-step. Thus, collisions last for the duration of a time-step, allowing neighboring particles of the colliding particles to influence each other. We show that our merging-and-splitting method is effective in robustly handling collisions and avoiding penetrations in particle-based simulations. We also show how our merging-and-splitting approach can be used for coupling different simulation systems using different and otherwise incompatible integrators. We present simulation tests …
W. Usher, X. Huang, S. Petruzza, S. Kumar, S. R. Slattery, S. T. Reeve, F. Wang, C. R. Johnson,, V. Pascucci. Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts, In IPDPS, 2021.
V. Vedam-Mai, K. Deisseroth, J. Giordano, G. Lazaro-Munoz, W. Chiong, N. Suthana, J. Langevin, J. Gill, W. Goodman, N. R. Provenza, C. H. Halpern, R. S. Shivacharan, T. N. Cunningham, S. A. Sheth, N. Pouratian, K. W. Scangos, H. S. Mayberg, A. Horn, K. A. Johnson, C. R. Butson, R. Gilron, C. de Hemptinne, R. Wilt, M. Yaroshinsky, S. Little, P. Starr, G. Worrell, P. Shirvalkar, E. Chang, J. Volkmann, M. Muthuraman, S. Groppa, A. A. Kühn, L. Li, M. Johnson, K. J. Otto, R. Raike, S. Goetz, C. Wu, P. Silburn, B. Cheeran, Y. J. Pathak, M. Malekmohammadi, A. Gunduz, J. K. Wong, S. Cernera, A. W. Shukla, A. Ramirez-Zamora, W. Deeb, A. Patterson, K. D. Foote, M. S. Okun.
Proceedings of the Eighth Annual Deep Brain Stimulation Think Tank: Advances in Optogenetics, Ethical Issues Affecting DBS Research, Neuromodulatory Approaches for Depression, Adaptive Neurostimulation, and Emerging DBS Technologies, In Frontiers in Human Neuroscience, Vol. 15, pp. 169. 2021.
We estimate that 208,000 deep brain stimulation (DBS) devices have been implanted to address neurological and neuropsychiatric disorders worldwide. DBS Think Tank presenters pooled data and determined that DBS expanded in its scope and has been applied to multiple brain disorders in an effort to modulate neural circuitry. The DBS Think Tank was founded in 2012 providing a space where clinicians, engineers, researchers from industry and academia discuss current and emerging DBS technologies and logistical and ethical issues facing the field. The emphasis is on cutting edge research and collaboration aimed to advance the DBS field. The Eighth Annual DBS Think Tank was held virtually on September 1 and 2, 2020 (Zoom Video Communications) due to restrictions related to the COVID-19 pandemic. The meeting focused on advances in: (1) optogenetics as a tool for comprehending neurobiology of diseases and on optogenetically-inspired DBS, (2) cutting edge of emerging DBS technologies, (3) ethical issues affecting DBS research and access to care, (4) neuromodulatory approaches for depression, (5) advancing novel hardware, software and imaging methodologies, (6) use of neurophysiological signals in adaptive neurostimulation, and (7) use of more advanced technologies to improve DBS clinical outcomes. There were 178 attendees who participated in a DBS Think Tank survey, which revealed the expansion of DBS into several indications such as obesity, post-traumatic stress disorder, addiction and Alzheimer’s disease. This proceedings summarizes the advances discussed at the Eighth Annual DBS Think Tank.
A. Venkat, A. Gyulassy, G. Kosiba, A. Maiti, H. Reinstein, R. Gee, P.-T. Bremer, V. Pascucci. Towards replacing physical testing of granular materials with a Topology-based Model, Subtitled arXiv preprint arXiv:2109.08777, 2021.
In the study of packed granular materials, the performance of a sample (e.g., the detonation of a high-energy explosive) often correlates to measurements of a fluid flowing through it. The "effective surface area," the surface area accessible to the airflow, is typically measured using a permeametry apparatus that relates the flow conductance to the permeable surface area via the Carman-Kozeny equation. This equation allows calculating the flow rate of a fluid flowing through the granules packed in the sample for a given pressure drop. However, Carman-Kozeny makes inherent assumptions about tunnel shapes and flow paths that may not accurately hold in situations where the particles possess a wide distribution in shapes, sizes, and aspect ratios, as is true with many powdered systems of technological and commercial interest. To address this challenge, we replicate these measurements virtually on micro-CT images of the powdered material, introducing a new Pore Network Model based on the skeleton of the Morse-Smale complex. Pores are identified as basins of the complex, their incidence encodes adjacency, and the conductivity of the capillary between them is computed from the cross-section at their interface. We build and solve a resistive network to compute an approximate laminar fluid flow through the pore structure. We provide two means of estimating flow-permeable surface area: (i) by direct computation of conductivity, and (ii) by identifying dead-ends in the flow coupled with isosurface extraction and the application of the Carman-Kozeny equation, with the aim of establishing consistency over a range of particle shapes, sizes, porosity levels, and void distribution patterns.
B. Wang, D. Zou, Q. Gu, S. J. Osher. Laplacian smoothing stochastic gradient markov chain monte carlo, In SIAM Journal on Scientific Computing, Vol. 43, No. 1, SIAM, pp. A26-A53. 2021.
As an important Markov chain Monte Carlo (MCMC) method, the stochastic gradient Langevin dynamics (SGLD) algorithm has achieved great success in Bayesian learning and posterior sampling. However, SGLD typically suffers from a slow convergence rate due to its large variance caused by the stochastic gradient. In order to alleviate these drawbacks, we leverage the recently developed Laplacian smoothing technique and propose a Laplacian smoothing stochastic gradient Langevin dynamics (LS-SGLD) algorithm. We prove that for sampling from both log-concave and non-log-concave densities, LS-SGLD achieves strictly smaller discretization error in 2-Wasserstein distance, although its mixing rate can be slightly slower. Experiments on both synthetic and real datasets verify our theoretical results and demonstrate the superior performance of LS-SGLD on different machine learning tasks including posterior …
Z. Wang, W. Xing, R. Kirby, S. Zhe. Multi-Fidelity High-Order Gaussian Processes for Physical Simulation, In International Conference on Artificial Intelligence and Statistics, PMLR, pp. 847-855. 2021.
The key task of physical simulation is to solve partial differential equations (PDEs) on discretized domains, which is known to be costly. In particular, high-fidelity solutions are much more expensive than low-fidelity ones. To reduce the cost, we consider novel Gaussian process (GP) models that leverage simulation examples of different fidelities to predict high-dimensional PDE solution outputs. Existing GP methods are either not scalable to high-dimensional outputs or lack effective strategies to integrate multi-fidelity examples. To address these issues, we propose Multi-Fidelity High-Order Gaussian Process (MFHoGP) that can capture complex correlations both between the outputs and between the fidelities to enhance solution estimation, and scale to large numbers of outputs. Based on a novel nonlinear coregionalization model, MFHoGP propagates bases throughout fidelities to fuse information, and places a deep matrix GP prior over the basis weights to capture the (nonlinear) relationships across the fidelities. To improve inference efficiency and quality, we use bases decomposition to largely reduce the model parameters, and layer-wise matrix Gaussian posteriors to capture the posterior dependency and to simplify the computation. Our stochastic variational learning algorithm successfully handles millions of outputs without extra sparse approximations. We show the advantages of our method in several typical applications.
The main objective for understanding fluorescence microscopy data is to investigate and evaluate the fluorescent signal intensity distributions as well as their spatial relationships across multiple channels. The quantitative analysis of 3D fluorescence microscopy data needs interactive tools for researchers to select and focus on relevant biological structures. We developed an interactive tool based on volume visualization techniques and GPU computing for streamlining rapid data analysis. Our main contribution is the implementation of common data quantification functions on streamed volumes, providing interactive analyses on large data without lengthy preprocessing. Data segmentation and quantification are coupled with brushing and executed at an interactive speed. A large volume is partitioned into data bricks, and only user-selected structures are analyzed to constrain the computational load. We designed a framework to assemble a sequence of GPU programs to handle brick borders and stitch analysis results. Our tool was developed in collaboration with domain experts and has been used to identify cell types. We demonstrate a workflow to analyze cells in vestibular epithelia of transgenic mice.