PEARC17 has ended
Once you’ve registered and arrive in New Orleans, be sure to use our mobile web app to manage your busy schedule so you don’t miss a thing. Also check the website for updates and use the #PEARC17 hashtag to keep up with friends and colleagues.  

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Technology [clear filter]
Tuesday, July 11


Invited Talk: Reproducibility and Containers: The Perfect Sandwich

Dear reader, how should you disseminate your software? If you want your recipe to come out just right, we encourage you to put it in a container. One such container, Singularity, is the first of its kind to be securely deployed internationally on more than 40 shared cluster resources. Its registry, Singularity Hub, further supports reproducible science by building and making containers accessible to any user of the software. In this talk, Vanessa will review the primary use cases for both Singularity and Singularity Hub, and how both have been designed to support modern, common workflows. (Greg will participate remotely.) She will discuss current and future challenges for building, capturing metadata for, and organizing the exploding landscape of containers, and present novel work for assessing reproducibility of such containers. Containers are changing scientific computing, and this is something to be excited about.


Tuesday July 11, 2017 11:00am - 12:00pm
Strand 11


Challenges of workload analysis on large HPC systems; a case study on NCSA Blue Waters
Blue Waters is Petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters using Open XDMoD. The analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013 - September 30, 2016).

This paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various software engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to Blue Waters engendered by the extremely large jobs that it typically excecuted.

Tuesday July 11, 2017 12:00pm - 12:30pm
Strand 11


A buffering approach to manage I/O in a normalized cross-correlation earthquake detection code for large seismic datasets
Continued advances in high-performance computing architectures constantly move the computational performance forward widening performance gap with I/O.
As a result, I/O plays an increasingly critical role in modern data-intensive scientific applications.

We have developed a high-performance GPU-based software called \textit{cuNCC}, which is designed to calculate seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. GPU's
acceleration greatly reduced the compute time and we are currently investigating I/O optimizations, to tackle this new performance bottleneck.

In order to find an optimal I/O solution for our \textit{cuNCC} code, we had performed a series of I/O benchmark tests and implemented buffering in CPU
memory to manage the output transfers. With this preliminary work, we were able to establish that buffering improves the I/O bandwidth achieved, but is only
beneficial when I/O bandwidth is limited, since the cost of the additional memory copy may exceed improvement in I/O. However, in the realistic environment
where I/O bandwidth per node is limited, and small I/O transfers are penalized, this technique will improve overall performance. In addition, by using a large
memory system, the point at which computing has to stop to wait for I/O is delayed, enabling fast computations on larger data sets.

Tuesday July 11, 2017 2:00pm - 2:30pm
Bolden 5


Analytics Environments on Demand: Providing Interactive and Scalable Research Computing with Windows
Historically, the experimental and observational sciences have been well served by traditional High Performance Computing (HPC). More recently, researchers from the life sciences and other domains have joined the HPC ranks. Cloud Computing offers promising alternatives to HPC, yet neither HPC nor Cloud are sufficient to the meet the computational needs of researchers in other academic domains -- those newer to research computing and big data -- for example, from the social sciences, digital humanities and from professional schools, such as Law and Business. This paper describes the development and practice of a research computing service that provides interactive and scalable computing in a Windows environment, including the technical and end-user support challenges that were overcome to provide the service.

Tuesday July 11, 2017 2:00pm - 2:30pm
Strand 11


Demonstrating Distributed Workflow Computing with a Federating Wide-Area File System
We have demonstrated the synergy of a wide-area SLASH2 file system with remote bioinformatics workflows between Extreme Science and Engineering Discovery Environment sites using the Galaxy Project’s web-based platform for reproducible data analysis. Wide-area Galaxy workflows were enabled by establishing a geographically-distributed SLASH2 instance between the Greenfield system at Pittsburgh Supercomputing Center and virtual machines incorporating storage within the Corral file system at the Texas Advanced Computing Center. Analysis tasks submitted through a single Galaxy instance seamlessly leverage data available from either site. In this paper, we explore the advantages of SLASH2 for enabling workflows from Galaxy Main.

Tuesday July 11, 2017 2:30pm - 3:00pm
Strand 11


Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
When a new technology is introduced into the HPC community, it is necessary to understand its performance and how it can affect the way applications interact with the hardware. Intel has recently introduced two new elements into the HPC ecosystem that are being widely adopted by many centers: Intel Omni-Path high performance network and Intel Knights Landing processor. While it is possible to find different studies that analyze the efficiency of the Knights Landing processor, it is not the same situation for Omni-Path, the new 100 Gb/s fabric from Intel. This paper presents a set of studies that investigate the effectiveness of system comprised of this processor and network. The outcomes of this work can be used as guidelines for a better exploitation of these resources on production systems. Also, the methodology employed during our tests can be replicated on a variety of systems and centers to find the ideal configurations of their hardware resources and provide users with recommendations that can improve the performance of their codes and the overall throughput of the clusters.

Tuesday July 11, 2017 3:00pm - 3:30pm
Strand 11
Wednesday, July 12


We Have an HPC -- Now What?
If you build it, will they come? Not necessarily. A critical need exists for knowledge in managing and properly utilizing supercomputing at mid-level and smaller research institutions. Simply having HPC hardware and some software is not enough. This paper relates the administrative experience of the first several months of a mid-level doctoral university providing a new enterprise XSEDE Compatible Basic Cluster (XCBC) high performance computing cluster to faculty and other researchers, including the experiences of first-day urgencies, initial problems in the first few weeks, and establishing an ongoing management system.


Wednesday July 12, 2017 11:00am - 11:30am
Strand 11


Deploying RMACC Summit: An HPC Resource for the Rocky Mountain Region
RMACC Summit is a heterogeneous supercomputer cluster with an aggregate floating point performance of 379 TFLOPS (Rmax, as currently configured) that provides about 85 million core-hours/yr to researchers from institutions participating in the Rocky Mountain Advanced Computing Consortium (RMACC) . The development of Summit was a collaborative effort toward specifying a system that meets the needs of researchers at multiple universities, and included implementation and testing of several new technologies. We discuss our experiences in creating and maintaining a successful ongoing collaboration between the two universities that are RMACC Summit's primary operators, and consider both the technical and support challenges of extending that collaboration to other regional users.

Wednesday July 12, 2017 11:30am - 12:00pm
Strand 11


Stampede 2: The Evolution of an XSEDE Supercomputer
The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches the end of its production life, it is being replaced in phases by a new supercomputer, Stampede 2, that will not only take up much of the original system’s workload, but continue the bridge to technologies on the path to exascale computing. This paper provides a brief summary of the experiences of Stampede 1, and details the design and architecture of Stampede 2. Early results are presented from a subset of Intel Knights Landing nodes that are bridging between the two systems.

Wednesday July 12, 2017 12:00pm - 12:30pm
Strand 11


A real-time machine learning and visualization framework for scientific workflows
High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows.

Wednesday July 12, 2017 2:00pm - 2:30pm
Strand 11


GenApp Integrated with OpenStack Supports Elastic Computing on Jetstream
GenApp is a universal and extensible tool for rapid deployment of applications. GenApp builds fully functioning science gateways and standalone GUI applications from collections of definition files and libraries of code fragments. Among the main features are the minimal technical expertise requirement for the end user and an open-end design ensuring sustainability of generated applications. Because of the conceptual simplicity of use, GenApp is ideally suited to scientists who are not professional developers, to disseminate their theoretical and experimental expertise as embodied in their code to their communities by rapidly deploying advanced applications. GenApp has an open extensible resource execution model. To support efficient elastic cloud computing on NSF Jetstream, GenApp has recently integrated OpenStack as a target resource with optional job-specific XSEDE project accounting.

Wednesday July 12, 2017 2:30pm - 3:00pm
Strand 11


XSEDE Technology Investigation Service (TIS)
The Evaluating and Enhancing the eXtreme Digital Cyberinfrastructure for Maximum Usability and Science Impact project, known currently as the Technology Investigation Service (TIS), was a collaboration between University of Illinois at Urbana-Champaign National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, The University of Texas at Austin Texas Advanced Computing Center, University of Tennessee National Institute for Computational Sciences, and University of Virginia which identified and evaluated potential technologies to close the gap between the XSEDE (Extreme Science and Engineering Discovery Environment, http://www.xsede.org) service offerings and the needs of XSEDE’s users. This project was funded by the NSF Division of Advanced Cyberinfrastructure (award ACI 09-46505) in response to the “Technology Audit and Insertion Service” component of the “TeraGrid Phase III: eXtreme Digital Resources for Science and Engineering (XD)” solicitation (NSF 08-571). Over the project lifetime the two major goals of TIS were: 1) identifying, tracking, evaluating and making recommendations of new technologies to XSEDE for consideration of adoption and 2) raising awareness of TIS to XSEDE and other stakeholders to solicit their input on technologies for consideration for evaluation. In accomplishing the goals, the following four significant outcomes from TIS resulted: the development and deployment of the XSEDE Technology Evaluation Database; the development of the significantly improved XSEDE software search capability; the technology evaluation process; and the evaluations performed along with their corresponding technology adoption recommendations to XSEDE. This paper highlights the life-cycle of the TIS project, including lessons learned and project outcomes.

avatar for John Towns

John Towns

Director of Collaborative eScience Programs, National Center for Supercomputing Applications
Director of Collaborative eScience Programs at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NCSA's Collaborative eScience Directorate works to reduce barriers to the use of cyberinfrastructure so more people in more fields of inquiry can... Read More →

Wednesday July 12, 2017 3:00pm - 3:30pm
Strand 11
Thursday, July 13


A Platform for Computationally Advanced Collaborative AgroInformatics Data Discovery and Analysis
The International Agroinformatics Alliance (IAA) is a coalition of public and private institutions that are cooperating to develop a platform for computationally advanced collaborative analysis of agricultural data. By combining large agricultural data sets with advanced analysis techniques, IAA seeks to catalyze agricultural research, leading to improved agricultural productivity and stability. IAA has constructed a platform that combines Jupyterhub web notebooks for interactive data analysis, relational databases for storage of crop genetic and geospatial data, and the Globus file transfer system for efficient data transfer and authentication. The platform uses a data permissions system that allows users to share data with collaborators. The central platform is located at the Minnesota Supercomputing Institute, at the University of Minnesota, which allows access to the large storage and compute resources required for advanced agroinformatics analysis pipelines.

Thursday July 13, 2017 9:00am - 9:30am
Strand 11


Shared research group storage solution with integrated access management
Management of research data storage areas for projects and research groups has become a time consuming task for research and data support staff. Purdue Research Computing staff have developed a web-based system to allow quick provisioning of group storage areas on the Research Data Depot shared filesystem operated by Purdue's Research Computing group. The system allows technical and non-technical staff to provision Research Data Depot storage allocations and configure data access management for research groups quickly, consistently, and reliably via an intuitive web interface. This system also allows for tracking allocation usage of these storage spaces and customizable alerting of the users of these spaces when allocations are nearly consumed.

Thursday July 13, 2017 9:30am - 10:00am
Strand 11


Flexible Enforcement of Multi-factor Authentication with SSH via Linux-PAM for Federated Identity Users
A computational science project with restricted-access data was awarded an allocation by XSEDE in 2016 to use the Bridges supercomputer at the Pittsburgh Supercomputing Center (PSC). As a condition of the license agreement for access to the data, multi-factor authentication (MFA) with XSEDE's Duo MFA service is required for users of this project to login to Bridges via SSH, in addition to filesystem access controls. Since not all Bridges users are required to authenticate to Bridges in this manner, a solution was implemented via Linux-PAM to require XSEDE Duo MFA for SSH login access by specific users, as identified by their local account name or membership in a local group. This paper describes the implementation on Bridges and its extensibility to other systems and environments with similar needs.

Thursday July 13, 2017 10:00am - 10:30am
Strand 11


Performance Benchmarking of the R Programming Environment on the Stampede 1.5 Supercomputer
We present performance results obtained with a new single-node performance benchmark of the R programming environment on the many-core Xeon Phi Knights Landing and standard Xeon-based compute nodes of the Stampede supercomputer cluster at the Texas Advanced Computing Center. The benchmark package consists of microbenchmarks of linear algebra kernels and machine learning functionality that includes clustering and neural network training from the R distribution. The standard Xeon-based nodes outperformed their Xeon Phi counterparts for matrices of small to medium dimensions, performing approximately twice as fast for most of the linear algebra micro benchmarks. For most of the same microbenchmarks the Knights Landing compute nodes were competitive with or outperformed the standard Xeon-based nodes for matrices of medium to large dimensions, executing as much as five times faster than the standard Xeon-based nodes. For the clustering and neural network training microbenchmarks, the standard Xeon-based nodes performed up to four times faster than their Xeon Phi counterparts for many large data sets, indicating that commonly used R packages may need to be reengineered to take advantage of existing optimized, scalable kernels.

Thursday July 13, 2017 11:00am - 11:30am
Strand 11


Building Bridges - The System Administration Tools and Techniques used to Deploy Bridges
High Performance Computing is continually growing
in scope and areas of research. To cover these new
areas of research, HPC has to become more flexible to
handle the wide variety of workloads. As the
computing becomes more flexible, the infrastructure
becomes more complex to accommodate these new
varied workloads.
At Pittsburgh Supercomputing Center, a Level 1
XSEDE Service Provider, we faced this exact problem
for Bridges, our new NSF-funded configurable
computing resource. This talk will cover the technical
decisions that were made for Bridges, why we made
them, the tools we chose, and what the users gain. We
chose Openstack Ironic for system installation,
Openstack for managing virtual machines, Puppet for
configuration, Slurm for scheduling, Naemon, Elastic
Search / Logstash / Kibana and InfluxDB for
monitoring and reporting. This software gives
flexibility to give the users a wide range of ways to do
computing at PSC. Additionally, it gives the ability to
maintain an even higher level of monitoring and
reporting that changes automatically as the systems
change functionality.

Thursday July 13, 2017 11:30am - 12:00pm
Strand 11


OpenMP 4 Fortran Modernization of WSM6 for KNL
Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As architectures mature and compiler OpenMP implementations evolve, best practices for code modernization change as well. In this paper, we examine the impact of newer OpenMP features (in particular OMP SIMD) on the Intel Xeon Phi Knights Landing (KNL) architecture, applied in optimizing loops in the single moment 6-class microphysics module (WSM6) in the US Navy's NEPTUNE code.
We find that with functioning OMP SIMD constructs, low thread invocation overhead on KNL and reduced penalty for unaligned access compared to previous architectures, one can leverage OpenMP 4 to achieve reasonable scalability with relatively minor reorganization of a production physics code.

Thursday July 13, 2017 12:00pm - 12:30pm
Strand 11