Loading…
This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
Once you’ve registered and arrive in New Orleans, be sure to use our mobile web app to manage your busy schedule so you don’t miss a thing. Also check the website for updates and use the #PEARC17 hashtag to keep up with friends and colleagues.  
View analytic

Sign up or log in to bookmark your favorites and sync them to your phone or calendar.

Sunday, July 9
 

3:00pm

Registration
Conference registration and information

Sunday July 9, 2017 3:00pm - 7:00pm
2nd Floor Registration Area (behind escalators)

4:30pm

Student Welcome Event
PEARC17 welcomes student attendees to the conference with this mixer, ice breaker session, and cybersecurity talk. Hors d'oeuvres will be provided. Special Agents Tracey Lin and Tracie Smith from the FBI Cyber Squad in New Orleans will be attending to give a talk about cybersecurity, and a local police officer will also give a short briefing on safety in New Orleans. This is a closed event limited to students participating in the PEARC17 student program.

Sunday July 9, 2017 4:30pm - 7:00pm
Imperial 5AB (Level 4)
 
Monday, July 10
 

7:30am

Registration
Conference registration and information

Monday July 10, 2017 7:30am - 7:30pm
2nd Floor Registration Area (behind escalators)

8:00am

Tutorials Breakfast
Monday July 10, 2017 8:00am - 9:00am
Empire AB

9:00am

Developing Science Gateways for Campuses using Apache Airavata
Science gateways, or Web portals, are an important mechanism for broadening and simplifying access to computational grids, clouds, and campus resources. Gateways provide science-specific user interfaces for scientific applications to end users who are unfamiliar with or need more capabilities than provided by command-line interfaces. Gateways are thus force multipliers for computing center user support staff because they provide users with self-service, safe ways to access scientific applications and data. In this tutorial, we present the Apache Airavata middleware for creating science gateways. Our goal is to show participants how to build and run science gateways that securely access campus computing resources, integrate with campus identity management systems, and comply with local usage policies. We further show how Apache Airavata can provide fine-grained access control for resources, forming the basis for statewide and regional university consortia to form regional cyberinfrastructure. This tutorial will build on XSEDE14, XSEDE15, and XSEDE16 tutorials. Extensive tutorial material is available from https://s.apache.org/scigap-xsede14, https://s.apache.org/scigap-xsede15, and https://s.apache.org/xsede16-airavata-tutorial. Prerequisites: Java will be required for part of the tutorial. Sample input files and other information can be found at https://s.apache.org/pearc17


Monday July 10, 2017 9:00am - 12:30pm
Strand 10B

9:00am

Engineering Your Application for Peak Performance with TAU and MVAPICH2
This tutorial presents tools and techniques to optimize the runtime tunable parameters exposed by the MPI using the TAU Performance System. MVAPICH2 exposes MPI performance and control variables using the MPI_T interface that is now part of the MPI-3 standard. The tutorial will describe how to use TAU and MVAPICH2 for assessing the application and runtime system performance. We present the complete workflow of performance engineering, including instrumentation, measurement (profiling and tracing, timing and PAPI hardware counters), data storage, analysis, and visualization. Emphasis is placed on how tools are used in combination for identifying performance problems and investigating optimization alternatives. We will request remote access to the Stampede system at TACC for hands on exercises. We will also provide the HPC Linux [http://www.hpclinux.org] OVA image containing all of the necessary tools (running within a virtual machine) for the hands- on sessions. Participants will learn how to use the TAU Performance System with MPI and OpenMP and use the MPI-T interface from the MVAPICH2 library on the Stampede system at TACC and on the VM. This will help to prepare participants to locate and diagnose performance bottlenecks in their own parallel programs.


Monday July 10, 2017 9:00am - 12:30pm
Bolden 6

9:00am

Getting Started with OpenHPC
This tutorial aims to provide attendees a thorough introduction and overview of the OpenHPC community project (http://openhpc.community). Formalized under the Linux Foundation in 2016, OpenHPC is a collaborative project comprised of over 30 members from academia, research labs, and industry. OpenHPC is focused on providing HPC-centric package builds for a variety of common components in an effort to minimize duplication and provide a platform to share configuration recipes from a variety of sites. The tutorial will be organized into two primary phases: (1) a lecture phase highlighting currently available software components, packaging conventions and hierarchical modules, integration test suite, and building block nature of the project, and (2) a live demonstration/walk-thru of a bare-metal cluster installation using OpenHPC repositories and multiple cluster-building recipes. After installation, attendees will then be able to login to the installed clusters to gain interactive experience with the development environment and to run example jobs through the resource manager(s). Prerequisites: attendees need to bring a laptop with wireless connectivity and working ssh client to participate in the hands-on lab.


Monday July 10, 2017 9:00am - 12:30pm
Strand 11A

9:00am

High-Throughput Computation on the Open Science Grid and AWS
Would you like to use distributed resources of the Open Science Grid, or just want to learn how to do large-scale high throughput computing? The format of this tutorial is a mix of lecture and hands-on exercises, so please bring your laptop and make sure you have an SSH client installed. You will be provided with a training account on OSG Connect which, during the tutorial, will be upgraded to a full user account. After the session, you will have full access to the OSG, know how to run and scale up workloads, manage your data and also submit to Amazon cloud resources. If time permits, the OSG User Support team will also help get your own workload set up for execution on OSG. Topics to include: Introduction to OSG Connect; Job Scheduling with HTCondor; Scaling Up Workloads; Managing data; High throughput submission to Amazon cloud. Please check the background material and setup requirements for this tutorial at https://swc-osg-workshop.github.io/OSG-UserTraining-PEARC17/


Monday July 10, 2017 9:00am - 12:30pm
Strand 3

9:00am

Introduction to Python
This tutorial is a quick immersion in the basics of the Python programming language, including all the tools needed to participate in the PEARC17 Student Modeling Day on Tuesday. Topics covered will be variables, types, operators, input/output, control flow, functions, classes, lists, libraries, plotting, data files, and Jupyter notebooks. The tutorial is intended for Python beginners, so hands-on experience will be emphasized. Most techniques will be presented in live-demo mode, and each section will feature an exercise so participants can try out the commands or methods for themselves. To participate fully in the exercises, attendees should come with the Anaconda Python 2.7 package downloaded and installed on their computer. You can get Anaconda Python at https://www.continuum.io/downloads.


Monday July 10, 2017 9:00am - 12:30pm
Strand 12B

9:00am

Programmable Cyberinfrastructure: Introduction to Building Clusters in the Cloud
Cloud computing is growing area for educating students and performing meaningful scientific research. The challenge for many educators and researchers is knowing how to use some of the unique aspects of computing in the cloud. One key feature is true elastic computing - resources on demand. This can be as simple as being able to provision a single purpose-driven virtual machine by hand quickly and easily. It can be taken a step further into creating scripted launch of additional resources as needed. Beyond that, elastic computing techniques can go to the next level and create modest virtual clusters on demand. While cloud resources won't replace traditional HPC environments for large research projects, there are many smaller research and education projects that would benefit from the highly customizable, highly configurable, programmable cyberinfrastructure afforded by cloud computing environments. This tutorial will discuss the basic methods required for interacting with elastic computing environments. It will then show a hands-on approach to creating virtual clusters in an Openstack environment. Please come prepared with a laptop with working wireless internet, and the following packages installed: Python 2.7 or later, setuptools, pip (the following pip packages may be installed in a virtualenv if you are comfortable with that!), "pip install python-glanceclient python-cinderclient python-openstackclient python-novaclient python-neutronclient python-keystoneclient python-heatclient shade". The goal is to have a working openstack client interface on your machine; for more details, see: https://docs.openstack.org/user-guide/common/cli-install-openstack-command-line-clients.html Required reading: It would be useful to explore the Jetstream wiki: https://wiki.jetstream-cloud.org - Particularly the sections on using the Jetstream API! We will provide training accounts and working openrc.sh files during the session!


Monday July 10, 2017 9:00am - 12:30pm
Strand 12A

9:00am

Simplified Research Data Management with Globus SaaS
Over the past six years, Globus has become a preferred service for moving, sharing, and publishing research data on a wide variety of HPC and campus computing resources. While usage across the XSEDE ecosystem continues to grow, there are many institutions and investigators who are either not aware of the capabilities and benefits Globus can provide, or have limited-scope deployments that they would like to expand. In this session, participants will learn about the features of the Globus service, and how to use it for delivering robust research data management services that span campus systems, national cyberinfrastructure, and public cloud resources. Globus is installed at all XSEDE service providers, many national facilities, and dozens of campus research computing centers. We will draw on experiences from this broad user base to highlight the challenges they face in delivering scalable research data management services. Attendees will be introduced to Globus and will have multiple opportunities for hands-on interaction with the service, both as end-users and system administrators tasked with deploying Globus endpoints on their storage systems.


Monday July 10, 2017 9:00am - 12:30pm
Strand 2

9:00am

Using Comet’s Virtual Clusters
Comet is an XSEDE HPC resource hosted and operated at SDSC. This tutorial introduces the virtual cluster capability of Comet, a unique feature that provides research groups, projects, and campuses with the ability to fully define their own software environment with a set of dynamically allocated virtual machines. This tutorial introduces Comet's virtual cluster capability, and has hands on material to cover the different modes of usage anticipated for virtual clusters. We begin the tutorial with an overview of the Comet system architecture, the design and architecture of the virtual cluster capability, and how it compares to other virtualized and cloud services. The high performance of the virtualized clusters combining the full AVX2 feature set of the Haswell processors and the InfiniBand HCAs using SR-IOV for MPI will be covered. We then follow with information on how to build, configure, and manage virtual clusters using the Cloudmesh client, a tool to easily interface with multiple clouds from the command line and a command shell. The hands-on section of the tutorial is divided into three sections: Installing and configuring a virtual cluster; Running MPI applications within the virtual cluster; Simple automation to start and stop virtual machines dynamically. SDSC and IU Staff will be available to meet with individual users, to further discuss usage of Comet, at the conclusion of the tutorial. This tutorial is appropriate for people with Linux system administration or management experience. HPC cluster management experience is not required but participants with this experience will benefit the most from this tutorial. Tutorial attendees wishing to participate in the hands-on portion of the tutorial will require a laptop with either VirtualBox or Python 2.7.x with virtualenv. Instructors will provide a Linux VM for VirtualBox users or attendees can install the Cloudmesh client tool in a Python virtualenv directly on their laptop.


Monday July 10, 2017 9:00am - 12:30pm
Strand 11B

9:00am

XSEDE New User Tutorial: Allocations and Resource Usage
This tutorial will provide training and hands-on activities to help new users learn and become comfortable with the basic steps necessary to first obtain, and then successfully employ an XSEDE allocation to accomplish their research or educational goals. The tutorial will consist of three sections: The first part of the tutorial will explain the XSEDE allocations process and how to write and submit successful allocation proposals. The instructor will describe the contents of an outstanding proposal and the process for generating each part. Topics covered will include the scientific justification, the justification of the request for resources, techniques for producing meaningful performance and scaling benchmarks, and navigating the XRAS system through the XSEDE Portal for electronic submission of proposals. The second section, "Information Security Training for XSEDE Researchers," will review basic information security principles for XSEDE users including: how to protect yourself from on-line threats and risks, how to secure your desktop/laptop, safe practices for social networking, email and instant messaging, how to choose a secure password and what to do if your account or machine have been compromised. The third part of the tutorial will cover the New User Training material that is been delivered remotely quarterly, but will delve deeper into these topics. New topics will be covered, including how to troubleshoot a job that has not run, and how to improve job turnaround by understanding differences in batch job schedulers on different platforms. We anticipate significant interest from Campus Champions, and therefore we will explain how attendees can assist others, as well as briefly describe projects that are being currently carried out in non-traditional HPC disciplines. Prerequisites: Java will be required for part of the tutorial. Sample input files and other information can be found at https://s.apache.org/pearc17


Monday July 10, 2017 9:00am - 12:30pm
Strand 1

9:00am

The 1st International Workshop on the US and China Collaboration in Experience and Best Practice in Supercomputing
See the workshop web site for details. This workshop welcomes all individuals and institutes who are interested in participating as speakers, panelists, or general audience. To attend the workshop, participants must register with the PEARC17 conference for Tutorials Day. 

This workshop is the first of a series to provide a forum for leaders, technology developers, and resources operators from leading international supercomputing centers to share ideas and experiences in deploying and servicing supercomputing technologies for open science users. The first workshop will have a particular focus on many-core technology development and application in China and the U.S. The development and deployment of supercomputing technologies in China have grown rapidly over the past decade. Together, China and the U.S. currently host seven out of ten of the fastest supercomputers in the world. Service providers in both countries face many similar challenges, such as large potential user communities, ever-increasing needs of high-performance computing resources, rapid hardware technology development, and evolving demands.
https://www.tacc.utexas.edu/conference/pearc17


Monday July 10, 2017 9:00am - 5:00pm
Strand 10A

9:00am

Advanced Manycore Programming (KNL)
As processors continue to eke out more performance at the cost of complexity, an understanding of the underlying hardware is essential to developing code that runs well on new platforms such as the KNL. To take advantage of these features, application development now requires the consideration of at least three different levels of parallelism (MPI, threading, SIMD), proper task/thread placement, and the allocation of limited resources such as high bandwidth memory. This tutorial is designed for experienced programmers familiar with OpenMP who wish to familiarize themselves with Intel’s next generation manycore processor, the 2nd Generation Intel Xeon Phi “Knights Landing” (KNL). We will start by discussing the evolution of manycore processing and provide an overview of the KNL hardware and its various memory modes. Then, we will briefly show the use of reports and directives to improve vectorization and the implementation of proper memory access. We will next focus on new Intel VTune Amplifier XE capabilities that allow for in-depth memory access analysis and hybrid code profiling, as well as Intel Advisor capabilities for vectorization analysis. Hands-on exercises will be executed on the KNL-upgraded Stampede system at the Texas Advanced Computing Center (TACC). Prerequisites: Participants should plan to bring their laptop configured with an SSH Client (e. g. PuTTY for Windows; Mac and Linux have terminals built-in), an X11 client (e. g. Xming for Windows; XQuartz for Mac; X Window System on Linux), and a VNC client (e. g. TigerVNC).


Monday July 10, 2017 9:00am - 5:00pm
Strand 4

9:00am

ARCC Tutorial: Enabling and Advancing Research Computing on Campuses
This PEARC/ARCC tutorial aims to educate attendees about enabling and advancing research computing on their campuses. Presentations will discuss the evolution of cyberinfrastructure (CI), strategies for advocating and supporting research computing, funding models and facilitation / user support. In addition, presentations on standard local needs and national resources available, on end-to-end connectivity capabilities, the CI ecosystem, and proposal writing and data management plans will be provided. The tutorial includes leaders in the field who have already committed to attending and presenting at the whole day tutorial.

8:45am – 9:00am 
Welcome/Audience introductions – what do you want from the workshop


9:00am – 10:00am
CI evolution panel discussion – case studies providing a range of experience; topics include: executive buy in, strategies for getting support, getting started, funding model
 - Barr von Oehsen, Rutgers University
 - Gwen Jacobs, University of Hawaii
 - Patrick Schmitz, UC Berkeley
 - Michael Erickson, Colorado School of Mines
 
10:00am – 10:30am
Finding and supporting faculty users (Henry Neeman, University of Oklahoma)
 
10:30am – 11:00am
Break
 
11:00am – 11:45am
Providing computing: local compute environment and accessing national resources (e.g., OSG, XSEDE, INCITE, NIH Cloud, etc.) (Tom Cheatham, University of Utah)
 
11:45am – 12:30pm
End to end connectivity (Science DMZ, data transfer, working with your regional network) (Joe Breen, University of Utah & Greg Monaco, Great Plains Network)
 
12:30pm – 1:30pm
Lunch
 
1:30pm – 2:15pm
Community of people: ACI-REF / CaRC (Lauren Michael, University of Wisconsin & Anita Orendt, University of Utah)
 
2:15pm – 3:00pm
Cybersecurity (Von Welch, NSF Cybersecurity Center of Excellence, Indiana University)
 
3:00pm – 3:30pm
Break
 
3:30pm – 5:00pm
Breakout sessions:
 - Funding opportunities & proposal writing (Dustin Atkins, Clemson University)
 - Creating CI and data management plans for your campus (Jill Gemmill, Clemson University, John Hicks, Internet2)
 - Monitoring tools (perfSONAR and XDMoD) (Joe Breen, University of Utah & Tom Furlani, University of Buffalo)
 
 
 


Monday July 10, 2017 9:00am - 5:00pm
Strand 13

10:30am

Morning Break
Monday July 10, 2017 10:30am - 11:00am
Strand Foyer

12:30pm

Tutorials Lunch
Monday July 10, 2017 12:30pm - 1:30pm
Empire AB

1:30pm

Building Data Portals and Science Gateways with Globus PaaS
Globus software-as-a-service (SaaS) is widely used by researchers to manage their data on XSEDE, DOE, and campus computing resources. The Globus platform-as-a-service (PaaS) makes APIs available for developers to use in external applications and services. The platform exposes identity and access management functionality that simplifies access to storage and computing resources using campus logins, and facilitates the integration of XSEDE and other research cyberinfrastructure services into web and mobile applications. In this tutorial, we will describe and demonstrate how developers can build web applications and services that leverage Globus and the Science DMZ to provide a broad range of researchers with access to advanced data management capabilities.  Prerequisites: if you would like to participate in the hands-on exercise you will need a laptop with an SSH client, a modern web browser, and Python 2.7+ (or a recent version of a Python development environment). It is helpful if you have some familiarity with web application development (creating and managing HTTP GET, PUT, and POST requests/responses) and are knowledgeable in Python, although most of the capabilities presented will be accessible using other programming languages.
To ensure that you are ready to participate in the exercises, we suggest that you install the pip (https://pip.pypa.io/en/stable/) and virtualenv (https://virtualenv.pypa.io/en/stable/) tools for Python, and verify that you can create and activate a virtualenv.
If you do not wish to install the Python exercises on your personal computer, we will also have cloud-hosted virtual machines available for use for the duration of the tutorial.


Monday July 10, 2017 1:30pm - 5:00pm
Strand 2

1:30pm

Enabling Science Beyond the Campus Edge: HPC Integration with the Open Science Grid
Enabling campus researchers to share computational and data resources with external collaborators is a powerful multiplier in advancing science. Sharing spare capacity for even short durations allows an institutional HPC resource a cost-efficient means of participating in a larger cyber ecosystem. In this session we will show you how to integrate your HPC cluster resource to the Open Science Grid to support collaborative, multi-institutional science. The only requirements are that your cluster can provide SSH access to a single OSG staff member, that your cluster job submission and worker nodes have outbound IP connectivity, the operating system is CentOS/RHEL 6.x, 7.x or similar, and that a common batch scheduler is used (e.g. SLURM, PBS, HTCondor). During the tutorial we will configure OSG managed services to deliver workloads from science communities using the OSG to your HPC cluster.

Additional information can be found at:

https://support.opensciencegrid.org/support/solutions/articles/12000025149-osg-managed-services




Monday July 10, 2017 1:30pm - 5:00pm
Strand 3

1:30pm

How to Accelerate Your Big Data Applications with Hadoop and Spark
Apache Hadoop and Spark are gaining prominence in handling Big Data and analytics. Recent studies have shown that default Hadoop and Spark can not leverage the high-performance networking and storage architectures on modern HPC clusters efficiently, like Remote Direct Memory Access (RDMA) enabled high-performance interconnects and heterogeneous and high-speed storage systems (e.g. HDD, SSD, NVMe-SSD, and Lustre). These middleware are traditionally written with sockets and do not deliver the best performance on modern high-performance networks. In this tutorial, we will provide an in-depth overview of the architecture of Hadoop components (HDFS, MapReduce, etc.) and Spark. We will examine the challenges in re-designing networking and I/O components of these middleware with modern interconnects, protocols (such as InfiniBand and RoCE) with RDMA and storage architectures. Using the publicly available software packages in the High-Performance Big Data (HiBD, http://hibd.cse.ohio-state.edu) project, we will provide case studies of the new designs for several Hadoop/Spark components and their associated benefits. Through these case studies, we will also examine the interplay between high-performance interconnects, high-speed storage systems, and multi-core platforms to achieve the best solutions for these components and Big Data applications on modern HPC clusters. This tutorial will provide hands-on sessions of Hadoop and Spark on SDSC Comet supercomputer.


Monday July 10, 2017 1:30pm - 5:00pm
Strand 11B

1:30pm

Interactive Science and Geospatial Visualization Tools for HUBzero Science Gateways
HUBzero is a powerful, open source software platform for creating dynamic science gateways that support scientific research and educational activities (hubzero.org). Used by communities such as nanotechnology, earth sciences, data curation, and healthcare, it is a proven framework for building science gateways, and a key part of many organizations’ cyberinfrastructure. There are more than 60 science gateways powered by HUBzero, totaling more than 2 million visitors each year. Among capabilities for collaboration, project work, and publishing, the HUBzero platform also provides an application framework for developing and deploying interactive computational tools. Historically these tools have been oriented toward mathematical modeling and simulation. However, currently there is a growing emphasis on interactive data visualization and exploration applications that are not as computationally intense and from which users expect a dynamic experience. Recent additions to HUBzero that further enhance the interactive and collaborative scientific discovery capabilities of the platform support this need. The addition of the Jupyter Notebook and RStudio Shiny web application frameworks enable the creation of dynamic applications supporting both researchers seeking on-the-fly modifications of computation models and data views but also instructors seeking to develop course materials with significant practical components. Using a HUBzero hub, researchers and educators can publish their notebooks and applications to a wide audience of people with similar interests. Further, the HUBzero middleware provides a secure place to share notebooks and applications which can also be disconnected from and reconnected to as a user changes locations or client computers. The GABBs (Geospatial Data Analysis Building Blocks) project independently funded by the NSF, is an example of the new breed of interactive data analysis and visualization needs. The project adds both general-purpose as well as geospatial-specific software modules to HUBzero. The nature of the integration of these modules with HUBzero lends itself to similar enhancements for other science gateways leveraging HUBzero.
This tutorial will start with a brief overview of HUBzero, but the focus will be on these recent enhancements to the platform. We will walk through the salient features of the Jupyter and Shiny frameworks through a series of hands-on activities. The latter half of the tutorial will describe the new data management and interactive visualization capabilities built by the GABBs project. Additional hands-on activities will walk attendees through the management of geospatial data, demonstrations of the metadata capture and preview capabilities and toolkits that enable the creation of interactive geospatial visualizations with minimal programming.
Prerequisites: this tutorial has a significant hands-on component, so you must bring your own laptop to participate fully. It would be best to install Oracle VM VirtualBox on your laptop beforehand. We will be distributing VMs for use on USB sticks.


Monday July 10, 2017 1:30pm - 5:00pm
Strand 10B

1:30pm

Introduction to Scientific Visualization and Data Sharing
Visualization is largely understood and used as an excellent communication tool by researchers. This narrow view often keeps scientists from fully recognizing and developing their visualization skill set. This tutorial will provide a "from the ground up" understanding of visualization and its utility in error diagnostic and exploration of data for scientific insight. When used effectively visualization can provide a complementary and effective toolset for data analysis, which is one of the most challenging problems in computational domains. In this tutorial we plan to bridge these gaps by providing end users with fundamental visualization concepts, execution tools, customization and usage examples. Finally, short hands on tutorials on Data Sharing using SeedMe.org will be provided.

Pre-requisites: None
Level: Introductory    
Tutorial Requirements:
1. Computer, mouse with scroll wheel (tablets are not sufficient for this tutorial)
2. VisIt software version 2.12.2 must be installed. Download executable/binary version (do not compile unless you are adventurous) for your operating system from here  https://wci.llnl.gov/simulation/computer-codes/visit/executables
3. Download sample data from here https://wci.llnl.gov/content/assets/docs/simulation/computer-codes/visit/visit_data_files.tar.gz
4.  Create an account on SeedMe.org for data sharing portion (optional)
EXPECTED ATTENDEE OUTCOMES
Bulk of this tutorial is hands on, thus attendees must be prepared to follow along to make most of this tutorial.
a)    Gain understanding of common visualization techniques for Mesh (Grid) based data
b)    Learn about sample use case scenarios and success stories
c)    Learn to use VisIt software for visualization and try out the standard visualization techniques discussed in 1.VisIt is one the two most powerful and popular open source software for visualization on HPC resources (Hands on)
d)    Perform remote visualization with HPC clusters  (Hands on)
e)    Share data and visualization via SeedMe.org
SESSION DETAILS
Session 1 (Lecture): Visualization Fundamentals
In this session we will provide a rapid introduction to fundamental visualization concepts. We will provide an assay of visualization techniques available accompanied by example application scenarios. We will also discuss best practices and shortcomings of visualization techniques. These fundamentals will help attendees to apply and innovate existing techniques for their own research.
·      Introduction to Visualization
·      Perception overview with eye color sensitivity
·      Visualization Techniques
·      Application Examples
·      Best Practices

Session 2 (Hands on): Visualization with VisIt
This session will provide a quick over view of VisIt and bulk of the session will be devoted to enable users to get a hands on experience with VisIt application.  The attendees will create several visualizations on their laptops by following instructor’s guidance.
·      VisIt Introduction
·      VisIt basics (how VisIt works, one plot & 2 operators)
·      Visit plot survey
·      Expressions
·      Commands and Scripting
·      Moviemaking
Session 3 (Hands on): Remote Interactive Visualization
This session will provide a instructions on how to create system host profile and connect to XSEDE host like Gordon and perform remote interactive visualization.
·      Remote Visualization (network permitting)
Session 4 (Hands on): Data Sharing using SeedMe.org
This session will provide instructions on how leverage the SeedMe infrastructure to share visualizations within and outside your research group.
·      SeedMe overview
·      Command line interaction with SeedMe.org
·      SeedMe integration with VisIt

Speakers

Monday July 10, 2017 1:30pm - 5:00pm
Strand 11A

1:30pm

Parallel I/O for Reading and Writing Large Files in Parallel
Developing an understanding of efficient parallel I/O and adapting your application accordingly can result in orders of magnitude of performance gains without overloading the parallel file system. This half-day tutorial will provide an overview of the practices and strategies for the efficient utilization of parallel file systems through parallel I/O for achieving high performance. The target audience is analysts and application developers who do not have prior experience with MPI I/O, HDF5, and T3PIO library. However, they should be familiar with C/C++/Fortran programming and basic MPI. A brief overview of the related basic concepts will be included in the tutorial where needed. 
Prerequisites: each participant will need to bring a personal laptops with an SSH client.


Monday July 10, 2017 1:30pm - 5:00pm
Bolden 6

1:30pm

The Data Scientist’s Python Toolbox
This tutorial is an intermediate level course on tackling the problems facing data scientist using Python. Python is a high-level object oriented language that has found wide acceptance in the scientific computing/ data science community. Ease of use and an abundance of software packages are some of the few reasons for this extensive adoption. Pandas is a high-level open-source library that provides data analysis tools for Python. It provides an efficient and comprehensive platform for a large number of analytics problems. For generating sophisticated visualizations two packages: Seaborn and Plotly are introduced. While Seaborn is aimed at Statisticians, Plotly provides a rich, interactive visualization framework which is ideal for visualizing large data. Plotly also allows visualization-rich dashboards which can be shared online. To conclude, out-of-core computing with Dask/Blaze is introduced for those datasets that won’t quite fit into memory. The goal of dask is to “extend the size of convenient datasets from ‘fits in memory’ to ‘fits on disk’” effectively fitting between Pandas and PySpark in the Python ecosystem for analytics. Additional materials for tutorial are available here https://bitbucket.org/sjraj/pearc/downloads/
Prerequisites: participants should bring a laptop and have one of the following
a. Anaconda distribution with Python 3 installed. Anaconda is a Python distribution and can be downloaded from https://www.continuum.io/downloads
b. A VirtualBox installation.


Monday July 10, 2017 1:30pm - 5:00pm
Strand 12B

1:30pm

Using R and RStudio on Jetstream
Jetstream is the first-of-its-kind production cloud resource intended to support education, science, and research. As part of XSEDE, the goal is to aid researchers across the United States that need modest amounts of interactive computing power. Part of the goal in implementing Jetstream is to increase the disciplinary diversity of the XD ecosystem as well as to reach non-traditional researchers that may have HPC needs but have not had adequate access or have faced other barriers to HPC usage. In our session, we will spend the first portion of the tutorial discussing the architecture and use of Jetstream followed by a short question and answer session, and then spend some time getting attendees to get on and try Jetstream. The second portion of the tutorial will build on the Jetstream introduction by using the R/Rstudio virtual machine (launched by attendees) to do some sample workflows in RStudio. The instructor will discuss R and RStudio and talk about the types of research and analysis that a researcher could utilize with R. The next steps will be to use the web desktop environment using RStudio on Jetstream to run a Twitter and sentiment analysis.


Monday July 10, 2017 1:30pm - 5:00pm
Strand 12A

1:30pm

XSEDE New User Tutorial: Using Science Gateways
This tutorial will build upon the XSEDE15 (http://sched.co/3YdG) and XSEDE16 tutorials (http://sched.co/7F7l). The purpose of this tutorial is to supplement the standard XSEDE new user tutorial with overviews of how to use science gateways so that new users can start using XSEDE for scientific research right away, at the conference, and continue at their home institution, without getting bogged down in the allocation process at the beginning. The tutorial is also appropriate for XSEDE Campus and Domain Champions who are interested in using science gateways to support outreach and support work on their local campuses. The target audience members are scientists, in particular domains (chemistry, neuroscience, atmospheric science), who are new to XSEDE and who optionally are familiar with common software packages in their field, but who do not have deep experience with using supercomputers and clusters. Campus Champions who work closely with new users are also encouraged to attend. The tutorial will provide a brief overview of XSEDE and the science gateway program, including a list of other available gateways not covered by the tutorial. The bulk of the tutorial will be a sequence of hands-on activities that introduce attendees to domain specific gateways. The tutorial organizers will work with XSEDE conference organizers and the outreach team to recruit new user attendees from the selected domains.  Prerequisites: Java will be required for part of the tutorial. Sample input files and other information can be found at https://s.apache.org/pearc17



Monday July 10, 2017 1:30pm - 5:00pm
Strand 1

3:00pm

Afternoon Break
Monday July 10, 2017 3:00pm - 3:30pm
Strand Foyer

6:30pm

Student-Mentor Dinner
Students and volunteer mentors will enjoy dinner followed by a presentation from Dell along with a short Q&A and networking session. This is a closed event limited to students participating in the PEARC17 student program and their designated mentors. If you're interested, you can volunteer to be a mentor when you register!

Monday July 10, 2017 6:30pm - 8:00pm
8 Block Restaurant (Level 1)
 
Tuesday, July 11
 

7:30am

Registration
Conference registration and information

Tuesday July 11, 2017 7:30am - 7:30pm
2nd Floor Registration Area (behind escalators)

8:00am

Breakfast
Tuesday July 11, 2017 8:00am - 9:00am
Empire AB

9:00am

Plenary: Paula Stephan, Georgia State, 'How Economics Shapes Science'
Paula Stephan is a Fellow of the American Association for the Advancement of Science and a member of the Board of Reviewing Editors, Science. Science Careers named Stephan its first “Person of the Year” in December 2012. Stephan has published numerous articles in such journals as The American Economic Review, The Journal of Economic Literature, Management Science, Nature, Organization Science, Research Policy and Science. Her book, How Economics Shapes Science, was published by Harvard University Press. Her research has been supported by the Alfred P. Sloan Foundation, the Andrew W. Mellon Foundation, and the National Science Foundation. Stephan serves on the National Academies Committee on the Next Generation of Researchers Initiative and the Research Council of The State University of New York (SUNY) System. In the recent past she served on the National Research Council’s Board on Higher Education and Workforce and the Committee to Review the State of the Postdoctoral Experience for Scientists and Engineers. She served on the National Advisory General Medical Sciences Council, National Institutes of Health 2005-2009 and also served on the Advisory Committee of the Social, Behavioral, and Economics Program, National Science Foundation, 2001-2008 (CEOSE, 2001-2003). She has held visiting positions at the Max Planck Institute, Munich, Germany; KU Leuven, Leuven, Belgium; Harvard University; International Center for Economic Research, Turin, Italy; and the Wizzenschaftszentrum für Social Forschung, Berlin, Germany. Stephan received her undergraduate degree in economics from Grinnell College and her PhD from the University of Michigan.

Tuesday July 11, 2017 9:00am - 10:30am
Empire CD

10:30am

Morning Break
Tuesday July 11, 2017 10:30am - 11:00am
Empire Foyer

11:00am

De Novo Assembly of Lucina pectinata Genome using Ion Torrent Reads
Lucina pectinata is a bivalve that lives in sulfide-rich environments and houses intracellular sulfide oxidizing endosymbiont. This organism is an ideal model to understand adaptive mechanisms and chemoautotrophic endosymbiosis in organisms living in sulfide-rich environments. However, only three hemoglobins have been completely characterized at protein and gene level leaving a gap in understanding the biology of this organism. In this work, we produced draft genomic assemblies with data produced by the Ion Proton Next Generation Sequencing System using both the MIRA4 and SPAdes assemblers. We compare and contrast these draft assemblies using metrics such as N50, total assembled length, number of predicted genes and other measures. We conclude that de novo assembly of eukaryotic organisms with NGS data from the Ion technology family remains complicated and may benefit from the use of multiple genome assemblers.


Tuesday July 11, 2017 11:00am - 11:30am
Bolden 5

11:00am

11:00am

A CyberGIS-Jupyter Framework for Geospatial Analytics at Scale
The interdisciplinary field of cyberGIS (aka geographic information science and systems (GIS) based on advanced cyberinfrastructure) has a major focus on data- and computation-intensive geospatial analytics. The rapidly growing needs across many application and science domains for such analytics based on disparate geospatial big data poses significant challenges for conventional GIS approaches. This paper describes CyberGIS-Jupyter, an innovative cyberGIS framework for achieving data-intensive, reproducible, and scalable geospatial analytics using the Jupyter Notebook based on ROGER - the first cyberGIS supercomputer. The framework adapts the Notebook with built-in cyberGIS capabilities to accelerate gateway application development and sharing while associated data, analytics and workflow runtime environments are encapsulated into application packages that can be elastically reproduced through cloud computing approaches. As a desirable outcome, data-intensive and scalable geospatial analytics can be efficiently developed and improved, and seamlessly reproduced among multidisciplinary users in a novel cyberGIS science gateway environment.


Tuesday July 11, 2017 11:00am - 11:30am
Strand 12

11:00am

ARCC: Reproducibility and Containers—Singularity and Singularity Hub
ARCC workshop attendees are encouraged to attend the Technology Track invited talk by Vanessa Sochat and Greg Kurtzer. For details, see the description of the Reproducibility and Containers session.

Tuesday July 11, 2017 11:00am - 12:00pm
Strand 11

11:00am

Invited Talk: Reproducibility and Containers: The Perfect Sandwich

Dear reader, how should you disseminate your software? If you want your recipe to come out just right, we encourage you to put it in a container. One such container, Singularity, is the first of its kind to be securely deployed internationally on more than 40 shared cluster resources. Its registry, Singularity Hub, further supports reproducible science by building and making containers accessible to any user of the software. In this talk, Vanessa will review the primary use cases for both Singularity and Singularity Hub, and how both have been designed to support modern, common workflows. (Greg will participate remotely.) She will discuss current and future challenges for building, capturing metadata for, and organizing the exploding landscape of containers, and present novel work for assessing reproducibility of such containers. Containers are changing scientific computing, and this is something to be excited about.

 



Tuesday July 11, 2017 11:00am - 12:00pm
Strand 11

11:00am

Crossing the Chasm: Best Practices in Reducing the Gender Gap in HPC

Attracting, retaining and developing female talent across the world is not only essential to an organization performance-- it's a business imperative. Learn how to be a leader of positive change for women in HPC. Learn how to create the social infrastructure to give women and other underrepresented groups access to the education, resources and opportunities they need to fully reach their potential. Also hear about how important it is to be focused on the pipeline of girls staying in science, technology, engineering and math in the learning paths. Come to hear about what IBM Canada and XSEDE are doing to help drive engagement of girls in STEM and be prepared to have an open discussion about what still needs to be done to increase the numbers of girls pursuing STEM education and careers.

Presenters
  • Lorna Rivera, Georgia Institute of Technology
  • Krista Shibata, IBM Canada Program Leader STEM


Tuesday July 11, 2017 11:00am - 12:30pm
Bolden 6

11:00am

Student Modeling Day
Student Modeling Day is a fantastic opportunity to work in teams to understand an underlying model and its behavior, simulate a range of conditions, write and test the computer codes (using Python programming language) that are needed to solve real-world problems, improve the user interface to the code, and validate code against real data or more sophisticated models.

Tuesday July 11, 2017 11:00am - 5:00pm
Imperial 5AB (Level 4)

11:30am

Optimizing High Performance Big Data Cancer Workflows
Appropriate optimization of bioinformatics workflows is vital to improve the timely discovery of variants implicated in cancer genomics. Sequenced human brain tumor data was assembled to optimize tool implementations and run various components of RNA sequence (RNA-seq) workflows. The measurable information produced by these tools account for the success rate and overall efficiency of a standardized and simultaneous analysis. We used the National Center for Biotechnology Information) Sequence Read Archive (NCBI-SRA) database to retrieve two transcriptomic datasets containing over 104 million reads as input data. We used these datasets to benchmark various file systems on the Bridges supercomputer to improve overall workflow throughput. Based on program and job timings, we report critical recommendations on selections of appropriate file systems and node types to efficiently execute these workflows.


Tuesday July 11, 2017 11:30am - 12:00pm
Bolden 5

11:30am

11:30am

Advancing analysis of high resolution topography using distributed HPC resources in OpenTopography
The OpenTopography science gateway provides efficient online access to high resolution topographic data and processing tools for a broad spectrum of research communities. We have integrated XSEDE HPC resources into the OpenTopography processing workflow to meet the growing demand for more complex and resource intensive algorithms from the wider community.


Tuesday July 11, 2017 11:30am - 12:00pm
Strand 12

12:00pm

A voice for Bioinformatics
One of the challenges to adoption of HPC is the disjunction between those who need it and those who know it. Biology (specifically, genomics) is a growing field for computational use, but the typical biologist does not have an established informatics background. The National Center for Genome Analysis Support (NCGAS) aids users in getting past the initial shock of the command line and guides them toward savvy cluster use.
NCGAS is initiating a push to become domain champions alongside Oklahoma State's Brian Cougar. Our position at IU gives us a close relationship with XSEDE and we already fulfill a role in pushing users toward XSEDE resources when our local clusters are ill-suited to the job. We currently act as liaison between biologists and Jetstream, IU and TACC's research computing cloud.
Typical issues include: Software installation; Software usage - what parameters do I choose, and how do I interpret the results; Batch job submission; Understanding how queues and job handlers work; Data movement, Spinning up VMs on Jetstream
We will discuss how we have structured our support, and illustrate our impact on XSEDE resources.


Tuesday July 11, 2017 12:00pm - 12:30pm
Bolden 5

12:00pm

ARCC: Clouds, Containers, and Related Topics
  • Nancy Wilkins-Diehr, SDSC—Science Gateway Software Institute
  • Steve Tucke, U Chicago—The new and improved Globus Connect Architecture


Tuesday July 11, 2017 12:00pm - 12:30pm
Strand 13

12:00pm

12:00pm

Cloud-enabling a Collaborative Research Platform: The GABBs Story
Modern cyberinfrastructures typically involve tightly integrated compute, storage
and web application resources. They also form the basis of science gateways, which
add their own science- specific processing or visualization capabilities. While
some science gateways are intended as the central resource provider for a certain
scientific community, others provide generic capabilities that are intended for
further customization at each installation site. However, replicating their setup
is a non-trivial task often involving specific operating system, software package
and configuration choices while also requiring allocation of the actual physical
computing resources. Cloud computing provides an attractive alternative, simplifying
resource provision and enabling reliable replication. We describe our ongoing
efforts to cloud-enable a science gateway that supports geospatial data management,
visualization, processing and publication. In particular, we describe our use of
Amazon Web Services (AWS), the automation of software installation and configuration
as well as some challenges encountered.


Tuesday July 11, 2017 12:00pm - 12:30pm
Strand 12

12:00pm

Challenges of workload analysis on large HPC systems; a case study on NCSA Blue Waters
Blue Waters is Petascale-level supercomputer whose mission is to greatly accelerate insight to the most challenging computational and data analysis problems. We performed a detailed workload analysis of Blue Waters using Open XDMoD. The analysis used approximately 35,000 node hours to process the roughly 95 TB of input data from over 4.5M jobs that ran on Blue Waters during the period that was studied (April 1, 2013 - September 30, 2016).

This paper describes the work that was done to collate, process and analyze the data that was collected on Blue Waters, the design decisions that were made, tools that we created and the various software engineering problems that we encountered and solved. In particular, we describe the challenges to data processing unique to Blue Waters engendered by the extremely large jobs that it typically excecuted.


Tuesday July 11, 2017 12:00pm - 12:30pm
Strand 11

12:30pm

Lunch
Tuesday July 11, 2017 12:30pm - 2:00pm
Empire AB

12:30pm

XD Metrics Service (XMS) Advisory Committee (invitation only)
The XMS Advisory Committee meeting (invitation only) serves to provide the XDMoD development team with strategic guidance to the development of XDMoD (XD Metrics on Demand for XSEDE) and Open XDMoD (XDMoD for HPC Centers).  Open XDMoD is an open source tool to facilitate the management of high performance computing resources.   It is widely deployed at academic, industrial and governmental HPC centers.  Open XDMoD’s management capabilities include monitoring standard metrics such as utilization, providing quality of service metrics designed to proactively identify underperforming system hardware and software, and reporting job level performance data for every job running on the HPC system.

Speakers

Tuesday July 11, 2017 12:30pm - 2:00pm
Bolden 3

2:00pm

Insights into Research Computing Operations using Big Data-Powered Log Analysis
Research computing centers provide researchers with a wide variety of services including large-scale computing resources, data storage, high-speed interconnect and scientific software repositories to facilitate continuous competitive research.
Efficient management of these complex resources and services, as well as ensuring their fair use by a large number of researchers from different scientific domains, are key to a center's success.
Almost all research centers use monitoring services based on real time data gathered from systems and services, but they often lack tools to perform a deeper analysis on large volumes of historical logs for identifying insightful trends from recurring events. The size of collected data can be massive, posing significant challenges for the use of conventional tools for this kind of analysis.
This paper describes a big data pipeline based on Hadoop and Spark technologies, developed in close collaboration between TACC and Georgia Tech. This data pipeline is capable of processing large volumes of data collected from schedulers using PBSTools, making it possible to run a deep analysis in minutes, as opposed to hours as would be required by conventional tools.
Our component-based pipeline design adds the flexibility of plugging in different components, as well as promotes data reuse.
Using this data pipeline, we demonstrate the process of formulating several critical operational questions around researcher behavior, systems health, operational aspects and software usage trends, all of which are critical factors in determining solutions and strategies for efficient management of research computing centers.


Tuesday July 11, 2017 2:00pm - 2:30pm
Strand 12

2:00pm

A buffering approach to manage I/O in a normalized cross-correlation earthquake detection code for large seismic datasets
Continued advances in high-performance computing architectures constantly move the computational performance forward widening performance gap with I/O.
As a result, I/O plays an increasingly critical role in modern data-intensive scientific applications.

We have developed a high-performance GPU-based software called \textit{cuNCC}, which is designed to calculate seismic waveform similarity for subjects like hypocenter estimates and small earthquake detection. GPU's
acceleration greatly reduced the compute time and we are currently investigating I/O optimizations, to tackle this new performance bottleneck.

In order to find an optimal I/O solution for our \textit{cuNCC} code, we had performed a series of I/O benchmark tests and implemented buffering in CPU
memory to manage the output transfers. With this preliminary work, we were able to establish that buffering improves the I/O bandwidth achieved, but is only
beneficial when I/O bandwidth is limited, since the cost of the additional memory copy may exceed improvement in I/O. However, in the realistic environment
where I/O bandwidth per node is limited, and small I/O transfers are penalized, this technique will improve overall performance. In addition, by using a large
memory system, the point at which computing has to stop to wait for I/O is delayed, enabling fast computations on larger data sets.


Tuesday July 11, 2017 2:00pm - 2:30pm
Bolden 5

2:00pm

Analytics Environments on Demand: Providing Interactive and Scalable Research Computing with Windows
Historically, the experimental and observational sciences have been well served by traditional High Performance Computing (HPC). More recently, researchers from the life sciences and other domains have joined the HPC ranks. Cloud Computing offers promising alternatives to HPC, yet neither HPC nor Cloud are sufficient to the meet the computational needs of researchers in other academic domains -- those newer to research computing and big data -- for example, from the social sciences, digital humanities and from professional schools, such as Law and Business. This paper describes the development and practice of a research computing service that provides interactive and scalable computing in a Windows environment, including the technical and end-user support challenges that were overcome to provide the service.


Tuesday July 11, 2017 2:00pm - 2:30pm
Strand 11

2:00pm

ARCC: ACI-REF Facilitators
Tuesday July 11, 2017 2:00pm - 2:45pm
Strand 13

2:00pm

NVIDIA: The Convergence of HPC and Deep Learning

Science is being enabled by Supercomputing, whether it’s climate science, combustion science, or understanding the fundamentals of how the human body works. What’s exciting is that the same technology enabling this powerful science is also enabling the revolution in deep learning, and it’s being enabled by GPUs. This session will explain how the rapid advancement of deep learning for artificial intelligence has created an enormous demand for computational resources outside the traditional supercomputing domains.  NVIDIA is uniquely suited to address these evolving needs with accelerated computing and will present recent GPU hardware and software advances and how they address computational needs in both AI and HPC.  There will also be an update on the Deep Learning Institute (DLI) Teaching Kit, which NVIDIA has partnered with Professor Yann LeCun of New York University and Facebook AI Research. The kit covers the academic theory and application of deep learning on GPUs using the PyTorch and Torch frameworks and includes detailed lecture slides, hands-on labs/source code solutions, quiz/exam problem sets, and free access to online deep learning labs using GPUs in the cloud.



Tuesday July 11, 2017 2:00pm - 3:00pm
Bolden 2

2:00pm

XDMoD Users Group
The XDMoD Users Group, which is open to all PEARC17 attendees, provides for an interchange of ideas regarding the continued develop of Open XDMoD, with a focus on helping to better meet the needs of the user community.  Open XDMoD is an open source tool to facilitate the management of high performance computing resources. It is widely deployed at academic, industrial and governmental HPC centers. Open XDMoD’s management capabilities include monitoring standard metrics such as utilization, providing quality of service metrics designed to proactively identify underperforming system hardware and software, and reporting job level performance data for every job running on the HPC system.

Speakers

Tuesday July 11, 2017 2:00pm - 3:30pm
Bolden 3

2:00pm

Finding the Path Forward: Expanding Diversity in Academic Research Computing
Speakers
avatar for Sharon Broude Geva

Sharon Broude Geva

Director of Advanced Research Computing (ARC), University of Michigan


Tuesday July 11, 2017 2:00pm - 3:30pm
Bolden 6

2:30pm

Visual exploration and analysis of time series earthquake data
Earthquake hazard estimation requires systematic investigation of past records as well as fundamental processes that cause the quake. Robust risk estimation requires detailed long-term records of earthquakes at all scales (magnitude, space, time), which are not available. Hence a synthetic method based on first principals could generate such records that could bridge this critical gap of missing data. RSQSim is such a simulator that generates seismic event catalogs for several thousand years at various scales. This synthetic catalog contains rich detail about the events and corresponding properties.
Exploring this data is of vital importance to validate the simulator as well as to identify features of interest such as quake time histories, conduct analysis such as mean recurrence interval of events on each fault section, etc. This work describes and demonstrates a prototype web based visual tool that enables scientists and students explore this rich dataset, as well as also discusses the refinement and streamlining data management and analysis that is less error prone and scalable.


Tuesday July 11, 2017 2:30pm - 3:00pm
Bolden 5

2:30pm

Spark on the ARC - Big data analytics frameworks on HPC clusters
In this paper we document our approach to overcoming service discovery and configuration of Apache Hadoop and Spark frameworks with dynamic resource allocations in a batch oriented Advanced Research Computing (ARC) High Performance Computing (HPC) environment. ARC efforts have produced a wide variety of HPC architectures. A common HPC architectural pattern is multi-node compute clusters with low-latency, high-performance interconnect fabrics and shared central storage. This pattern enables processing of workloads with high data co-dependency, frequently solved with message passing interface (MPI) programming models, and then executed as batch jobs. Unfortunately, many HPC programming paradigms are not well suited to big data workloads which are often easily separable. Our approach lowers barriers of entry to HPC environments by enabling end users to utilize Apache Hadoop and Spark frameworks that support big data oriented programming paradigms appropriate for separable workloads in batch oriented HPC environments.


Tuesday July 11, 2017 2:30pm - 3:00pm
Strand 12

2:30pm

Demonstrating Distributed Workflow Computing with a Federating Wide-Area File System
We have demonstrated the synergy of a wide-area SLASH2 file system with remote bioinformatics workflows between Extreme Science and Engineering Discovery Environment sites using the Galaxy Project’s web-based platform for reproducible data analysis. Wide-area Galaxy workflows were enabled by establishing a geographically-distributed SLASH2 instance between the Greenfield system at Pittsburgh Supercomputing Center and virtual machines incorporating storage within the Corral file system at the Texas Advanced Computing Center. Analysis tasks submitted through a single Galaxy instance seamlessly leverage data available from either site. In this paper, we explore the advantages of SLASH2 for enabling workflows from Galaxy Main.


Tuesday July 11, 2017 2:30pm - 3:00pm
Strand 11

2:45pm

3:00pm

Extracting Meaningful Data from Decomposing Bodies
We present Decomposing Bodies, a digital humanities project that examines the late-19th-century system of anthropometrical measurement introduced in France by Alphonse Bertillon. "Bertillonnage," as this system is commonly known, was the first measurement-based, state-controlled system used for criminal identification. Currently, researchers resort to the tedious manual transcription in order to study the data on these cards in bulk. Here, we propose an end-to-end system for extracting handwritten text and numbers from scanned Bertillon cards in a semi-automated fashion and also the ability to browse through the original data and generated metadata using a web interface. The proposed system will enable historians and humanities researchers to study the data produced by the Bertillon system with much more ease than ever before. To the best of our knowledge, this is the first system that has tried to automate Bertillon card analysis through the application of existing handwritten digit and word recognition methods. We present our current results on performing document analysis on a selected set of scanned Bertillon cards from the Ohio State Reformatory and Ohio Penitentiary. We conclude with a few recommendations for increasing the likelihood of success for collaborations between Computer Science and Digital Humanities researchers.


Tuesday July 11, 2017 3:00pm - 3:30pm
Bolden 5

3:00pm

3:00pm

Improving Utah's Scalability Through the Use of Portable Kokkos-Based Data Parallel Tasks
The University of Utah's Carbon Capture Multidisciplinary Simulation Center (CCMSC) is using the Uintah Computational Framework to predict performance of a 1000 MWe ultra-supercritical clean coal boiler. The center aims to utilize the Intel Xeon Phi-based DOE systems, Theta and Aurora, through the Aurora Early Science Program by using the Kokkos C++ library to enable node-level performance portability. This paper describes infrastructure advancements and portability improvements made possible by our integration of Kokkos within Uintah. Scalability results are presented that compare serial and data parallel task execution models for a challenging radiative heat transfer calculation, central to the center's predictive boiler simulations. These results demonstrate both good strong-scaling characteristics to 256 Knights Landing (KNL) processors on the NSF Stampede system, and show the KNL-based calculation to compete with prior GPU-based results for the same calculation.


Tuesday July 11, 2017 3:00pm - 3:30pm
Strand 12

3:00pm

Evaluation of Intel Omni-Path on the Intel Knights Landing Processor
When a new technology is introduced into the HPC community, it is necessary to understand its performance and how it can affect the way applications interact with the hardware. Intel has recently introduced two new elements into the HPC ecosystem that are being widely adopted by many centers: Intel Omni-Path high performance network and Intel Knights Landing processor. While it is possible to find different studies that analyze the efficiency of the Knights Landing processor, it is not the same situation for Omni-Path, the new 100 Gb/s fabric from Intel. This paper presents a set of studies that investigate the effectiveness of system comprised of this processor and network. The outcomes of this work can be used as guidelines for a better exploitation of these resources on production systems. Also, the methodology employed during our tests can be replicated on a variety of systems and centers to find the ideal configurations of their hardware resources and provide users with recommendations that can improve the performance of their codes and the overall throughput of the clusters.


Tuesday July 11, 2017 3:00pm - 3:30pm
Strand 11

3:30pm

Afternoon Break
Tuesday July 11, 2017 3:30pm - 4:00pm
Empire Foyer

4:00pm

4:00pm

Building & Managing Research Data Services
One of the most common and greatest challenges facing research computing today is meeting the data needs of modern research. Devices at all scales from the Large Hadron Collider to gene sequencers to networks of low cost sensors generate large amounts of data. The BoF will facilitate a conversation around strategies for building and managing research data services with a focus on data storage. This will be a community driven discussion that may include technologies, charge models, the type of IT group running the systems/services, compliance issues, data movement, data sharing, and beyond. The University of Iowa, Case Western Reserve University, Northwestern University, and University of Minnesota will act as a panel sharing experiences before engaging in a community conversation on best practices for tackling the research data challenge. To encourage the discovery of collaboration opportunities all panelists will also present on their top three research data challenges.


Tuesday July 11, 2017 4:00pm - 5:00pm
Strand 1

4:00pm

Challenges and Opportunities of VM Image Curation and Discovery
Virtual Machine (VM) Images are being used more and more widely, across a range of research computation services. To realize the promise of this technology, there is a need for maintenance regimes, discovery mechanisms, and curation practices. We propose a BoF session that will bring together practitioners from a range of institutions to discuss use-cases, needs, current tools and activities that can contribute, and next steps to address the needs across the community.


Tuesday July 11, 2017 4:00pm - 5:00pm
Bolden 6

4:00pm

CyberInfrastructure Forward-Looking Topics of Interest to Campus Champions

Participants will divide into 3 groups selected by campus champions - BoF attendees to select the topic group of their choice. Each group will discuss their topic for ~ 45 minutes; at the culmination of the individual group discussions, each group lead will be allotted 5 minutes for both a lightning summary talk on their topic.

1. Designing and sustaining a financial model for CI to enable centralized and shared resources institutionally.  As the number of campuses involved in research computation continues to grow, a primary concern is that of sustaining the CI program financially over a broad course of time.

  • Discussion will highlight the progressive models and methods currently being employed at institutions; as well as future opportunities and directions such as regionally shared centers, agency diversification, realistic expectations and needs, staffing, etc.
  • Special attention will be given to how to design a financial model for CI that can be bolstered through a more intense on-campus awareness program, targeting the key financial stakeholders, such as the PIs, the Directors/C-level officers of research computing, and the institution’s administration that governs the research office budget.
2. InfoSCi – Information Security for Cyberinfrastructure.
In light of new mandates from the federal awarding agencies requiring grant proposals to incorporate cybersecurity into research project’s lifetime award for the Data Management Plan and Data Governance to meet the controls sets for HRCI (High Risk and Confidential Information) and CUI (Controlled Unclassified Information);
Information Security has become of paramount importance to PIs, CCs, and universities research C-level leaders.
As a result this topic - InfoSci - is receiving much due diligence and rightfully so.
  • Information security has begun to be incorporated into existing awards, and is been built into new awards solicitations, spurred by the need to heighten the security of research infrastructure and the data through the various stages of the data and system life cycle.
  • How to develop, equip and grow the “next generation” of cyber practitioners and users - specifically the student body, through the introduction of security best practices to effectively incorporate Information security into the various arms of CyberInfrastructure (Personnel, Compute, Storage, Networking, Visualization) to interweaving InfoSCi into the fabric of research computing, while achieving projects goals and meeting the security requirements of the award agencies, in this evolving future of CyberInfrastructure.

3. Storage Models for Research Computing
The proliferation of data continues to grow rapidly from the availability of large research repositories and growth of local instrumentation with high resolution. This increase in the volume of data been generated, has created a lot of pressure for Research Computing organization to create a sustainable model for storage that is flexible, scalable, and low cost. A well organized plan (or lack thereof) can have a profound impact on the local research community.

  • Who are the key parties responsible for the data/storage plan? PIs commonly rely on their research students to design and support their compute and storage resources in small silos, but should they?
  • How do the storage needs of STEM and Social Sciences differ?
  • Students engagement in the CI Plan can positively impact the move to a centralized storage model, especially for small research institutions - large research institutions are less likely faced with the same financial constraints that smaller institutions face. Most early career PIs rely on their students to design and support their award-purchased HPC cluster silos, and the same can be said of some seasoned researchers. Hence, there are benefits to engaging students in an institution’s CI Plan, and the onus is on the Campus CI Working Group to begin to involve PIs-associated research students from both the STEM and Social Sciences fields in aiding the build-out of a centralized Storage Model for Research Computing.


Tuesday July 11, 2017 4:00pm - 5:00pm
Strand 11

4:00pm

High Performance Computing for Humanities, Arts, and Social Science
"This BOF is for anyone interested in high performance computing for Humanities, Arts, or Social Science. High Performance Computing, and XSEDE, offer capabilities that enable new research and teaching methodologies for scholars in humanities, arts, and social science.
This Birds of a Feather session will offer the opportunity to discuss and learn about how others are using HPC in their research, especially with respect to text analytics, video analytics, and image analytics, but the floor is open for topics of participant interest. Additionally, this BOF will provide an update on the XSEDE science gateways that are being developed and being made available for text analysis, image analysis, and video analysis.
Science gateways offer a low barrier of entry to high performance computing. XSEDE is working with humanities scholars to create gateways to allow others to easily use a variety of tools with their own data.
Topics we will discuss related to the gateways include (but are not limited to):
• Current Status of the Gateways
• Use Cases for Gateways
• Potential Data Sources
• Anything the group would like to discuss
Whether you are in the humanities, arts, or social sciences, or you are supporting scholars on your campus in these areas, or you are just interested, we’d love to have you be part of the discussion.
The format of this BOF will be a ~20-30 minute presentation by Alan Craig (XSEDE HASS Specialist), followed by open questions / answers / discussion time with all attendees. As their schedules permit, representatives from current XSEDE Humanities Projects will also be present and share their work. There will be someone to take minutes and to track follow-up action items. "

Speakers

Tuesday July 11, 2017 4:00pm - 5:00pm
Bolden 2

4:00pm

Lustre: Present, Future, and Community
Lustre is the leading open-source parallel file system for HPC. Since 2011 Lustre has transitioned from a single vendor focus to a community developed file system with world-wide contributors, and is now more widely used and in more mission-critical installations than ever. Lustre currently supports many HPC infrastructures, scientific research and financial services, oil and gas, advanced manufacturing, and visual effects. At the BoF attendees will have a chance to learn about and discuss the current state of the Lustre community and software development, including the organizations that support the Lustre community, e.g., OpenSFS. The goal will be an honest conversation around the benefits, challenges, and pitfalls of using and supporting Lustre, and where to get help, hopefully by growing a network amongst the attendees. The panelists will provide their perspectives on from the user, system administration, and user support viewpoints.


Tuesday July 11, 2017 4:00pm - 5:00pm
Strand 12

4:00pm

The Road to Sustainable Predictable Funding
"Developing your storyline and strategy on your journey to securing sustainable predictable funding for scientific computing is essential if you are to be successful.
Connecting the importance of digital infrastructure investment to tangible outcomes such as research excellence, competitiveness, workforce development and innovation requires sophisticated approaches in order to influence key decision makers at your institution, with funding bodies and/or elected officials.
Learn from the experts on how to align your initiatives to ensure you engage key influencers to successfully share your message and achieve your goals. This is where technical jargon will fail and storytelling and engagement will deliver impact."


Tuesday July 11, 2017 4:00pm - 5:00pm
Bolden 5

5:30pm

Posters Reception
Tuesday evening is your chance to speak with poster presenters about their efforts to improve and enhance advanced research computing. Judging for student posters will also take place. Join us for posters, discussion, and light hors d'oeurves.

Tuesday July 11, 2017 5:30pm - 7:00pm
Strand Foyer

7:00pm

 
Wednesday, July 12
 

7:30am

XSEDE ECSS Training: Part 1
Training session for XSEDE ECSS staff.

Wednesday July 12, 2017 7:30am - 9:00am
Bolden 6

7:30am

Registration
Conference registration and information

Wednesday July 12, 2017 7:30am - 7:30pm
2nd Floor Registration Area (behind escalators)

8:00am

Breakfast
Wednesday July 12, 2017 8:00am - 9:00am
Empire AB

9:00am

Plenary: Paul Morin, U Minnesota, 'Mapping the Poles with Petascale'
Paul Morin is Founder and Director of the Polar Geospatial Center, an NSF science and logistics support center at the University of Minnesota. Morin leads a team of two dozen responsible for imaging, mapping, and monitoring the Earth’s polar regions for the National Science Foundation’s Division of Polar Programs. Morin is the liaison between the National Science Foundation and the National Geospatial-Intelligence Agency’s commercial imagery program. Before founding PGC, Morin was at the National Center for Earth-Surface Dynamics at the University of Minnesota, and he has worked at the University of Minnesota since 1987. Morin serves as the National Academy of Sciences-appointed U.S. representative to the Standing Committee on Antarctic Geographic Information under the Scientific Committee for Antarctic Research (i.e., the Antarctic Treaty System). One of his current projects is ArcticDEM, a White House initiative to produce a high-resolution, time-dependent elevation model of the Arctic using Blue Waters. Morin’s professional interests include mapping areas of the Earth that are difficult to reach, scientific visualization and using scientific art for formal and informal education. Morin has dozens of publications in a variety of fields including remote sensing, geoscience education, the carbon cycle, and scientific visualization.

Wednesday July 12, 2017 9:00am - 10:30am
Empire CD

10:30am

Morning Break
Wednesday July 12, 2017 10:30am - 11:00am
Empire Foyer

11:00am

Advanced Computing for Social Change—Educating and Engaging Our Students to Compete in a Changing Workforce
Visualization taps into the very best capabilities of our brains, transforming data that is fundamentally abstract as numbers into something that communicates and illuminates information ranging from the simple to the complex. Visualization researchers, developers, practitioners and educators routinely work across traditional discipline boundaries, oftentimes in teams of people that come from a diverse blend of backgrounds. With a looming global workforce shortage in those educated in high-tech, we have adopted advanced computing for social change as a means of engaging our current student population to acquire high tech skills in the context of working on relevant social issues that are important to us all. This paper provides rationale for our recent approaches, discussing the looming shortage in the high tech sector. We also present case study data, including evaluation data and present merits of possible changes for our approach. We discuss bringing the universal language of visualization to bear on problems that have the potential to make significant societal impact and encourage and foster innovation at every step.


Wednesday July 12, 2017 11:00am - 11:20am
Bolden 6

11:00am

Accelerating High-energy Physics Exploration with Deep Learning
Every year, up to 30 petabytes of data are captured from the large hadron collider (LHC) at CERN, the European Centre for Nuclear Research. 1 petabyte of this data is offline-processed everyday using 11,000 servers with 100,000 processor cores. This huge amount of data represents only a very small fraction of the total amount of raw data generated by sensors in the trigger system of the collider at a rate of 40 million events per second. For the Compact Muon Solenoid (CMS) [1] experiment, the design and the construction of a large portion of the Level-1 trigger for muon detection will become even more challenging with nearly 1 billion collisions occurring per second as a result of the increased luminosity from proposed upgrades to the LHC. In order to adhere to the very strict requirements of the experiment, efficient algorithms must be conceived and implemented to perform the required physics analytics very quickly. In this work, we present our approach on applying deep learning for the identification of rarely produced physics particles (such as the Higgs Boson) out of a majority of background or noise dominated data. Because latency is of essence to the electronics of the data, a fast and efficient discrimination system would translate to less data being stored and subsequently reduced processing time. We examine how a generalized version of our approach could be used for improving the state-of-the-art in experimenting with deep learning models for research in high-energy physics.

Speakers

Wednesday July 12, 2017 11:00am - 11:30am
Bolden 5

11:00am

ARM/Allinea: Tools and Methodology for Ensuring HPC Programs Correctness and Performance

In this presentation we will discuss best practices and methodology for HPC software engineering. We will provide illustrations of how the Allinea debugging and performance analysis tools can be used to ensure that you obtain optimal performance from your codes and that your codes run correctly.



Wednesday July 12, 2017 11:00am - 11:30am
Bolden 2

11:00am

We Have an HPC -- Now What?
If you build it, will they come? Not necessarily. A critical need exists for knowledge in managing and properly utilizing supercomputing at mid-level and smaller research institutions. Simply having HPC hardware and some software is not enough. This paper relates the administrative experience of the first several months of a mid-level doctoral university providing a new enterprise XSEDE Compatible Basic Cluster (XCBC) high performance computing cluster to faculty and other researchers, including the experiences of first-day urgencies, initial problems in the first few weeks, and establishing an ongoing management system.

Speakers

Wednesday July 12, 2017 11:00am - 11:30am
Strand 11

11:00am

Invited Talk: Networking for Research
The Internet's conceptual model is that of the classic phone system: Connect two instruments (or interfaces) together with a wire. This is today an appropriate model still for some applications—e.g., connecting a remote user to the login host of an HPC system. But even this simple model admits of diversity for varied applications without analogs in classical telephony—quality of service in terms of guaranteed bandwidth and latency control, and elaborations such as the Science DMZ and DTNs.  Yet in the computational sciences, it is often the case that an investigator's focus is on the dataset(s) to be analyzed, while the interfaces and the hosts on which they reside are of no interest. These ideas lead to the concept of "information-centric networking" and network architectures different from the Internet, which are of growing interest both commercially and in the academy.  Until quite recently, network designers were forced to choose a communications model and network architecture to implement. Today however, the availability of very fast commodity hardware, inexpensive storage, and abundant bandwidth allow multiple architectures to co-exist on the same network substrate, and for particular instances to be stood up and torn down under end-user control. This talk will elaborate on these ideas and offer some examples.

Speakers
avatar for Stephen Wolff

Stephen Wolff

Principal Scientist, Internet2
Stephen Wolff is Principal Scientist at Internet2, with interest in the convergence of computing, storage, and networking, and in workforce development, integration, and professionalization to support converged high performance computing. His networking and supercomputing exper... Read More →


Wednesday July 12, 2017 11:00am - 12:00pm
Strand 12

11:00am

ARCC: Identity and Access Management, Security, and Related Topics
  • Jim Basney, U Illinois—CILogon 2.0: An Integrated Identity and Access Management Platform for Science
  • Von Welch, Indiana U—Scientific Data Integrity Challenges to be addressed in Pegasus
  • Vas Vasiliadis, U Chicago—Foundational Identity Management Services for Research Computing
  • Florence Hudson, Internet2—NSF Eager Cybersecurity Transition to Practice Program
  • Maureen Dougherty, USC—Secure Data Without a Budget
  • Steve Harper, U Utah—Leveraging custom LDAP authorization while using Active Directory authentication via SSSD in Linux



Wednesday July 12, 2017 11:00am - 12:30pm
Strand 13

11:20am

Undergraduate Educational Pathways for Developing High-Performance Computing Workforce
The need for college graduates with high-performance computing (HPC) skills is rapidly increasing with the increased interest in tasks requiring big data processing. Major needs in medicine, geosciences, and data analytics, among other disciplies, will drive demand for an HPC literate workforce. This paper presents a new Computer Science minor covering high-performance computing (HPC) and a state-wide competition that provide a structured environment for development of graduates with the skills needed to enter the HPC workforce. The new HPC minor is designed to provide undergraduate science and engineering students with a foundation in HPC basics to pursue advanced HPC study in graduate school or enter the HPC workforce. The minor is available to a wide range of disciplines with a minimum of additional required coursework to minimize overall college cost. The HPC competition provides a structured environment for high-school, undergraduate, and graduate students to gain experience in HPC and to spark their interest.


Wednesday July 12, 2017 11:20am - 11:40am
Bolden 6

11:30am

Implicit and Implicit-Explicit Strong Stability Preserving Runge-Kutta Methods with High Linear Order
When evolving in time the solution of a hyperbolic partial differential equation, it is often desirable to use high order strong stability preserving (SSP) time discretizations. These time discretizations preserve the monotonicity properties satisfied by the spatial discretization when coupled with the first order forward Euler, under a certain time-step restriction. While the allowable time-step depends on both the spatial and temporal discretizations, the contribution of the temporal discretization can be isolated by taking the ratio of the allowable time-step of the high order method to the forward Euler time-step. This ratio is called the strong stability coefficient. The search for high order strong stability time-stepping methods with high order and large allowable time-step had been an active area of research. It is known that implicit SSP Runge-Kutta methods exist only up to sixth order. However, if we restrict ourselves to solving only linear autonomous problems, the order conditions simplify and we can find implicit SSP Runge-Kutta methods of any linear order. In the current work we aim to find very high linear order implicit SSP Runge-Kutta methods that are optimal in terms of allowable time-step. Next, we formulate an optimization problem for implicit-explicit (IMEX) SSP Runge-Kutta methods and find implicit methods with large linear stability regions that pair with known explicit SSP Runge-Kutta methods of orders plin=3,4,6 as well as optimized IMEX SSP Runge-Kutta pairs that have high linear order and nonlinear orders p=2,3,4. These methods are then tested on sample problems to verify order of convergence and to demonstrate the sharpness of the SSP coefficient and the typical behavior of these methods on test problems.


Wednesday July 12, 2017 11:30am - 12:00pm
Bolden 5

11:30am

DDN: Architecting Academic Research Central Data Stores for Fast, Archive & Remote Data

In this talk Roger Goff, DDN’s Senior Technologist for Academic Research, will present 3 customer case studies highlighting common IO problems in shared research environments and recent examples of how advanced computing centers are solving them with innovative shared infrastructure. 

Examples will highlight problems and practical solutions for fast data tier, active archive, deep archive and multi-site collaboration and data protection.  Examples covered will include technologies specifics on NVMe SSD, Flash Caching, Automated Tiering, Private/Hybrid Cloud Infrastructure, Active Archive, multi-site shared namespace and production scale neural networks.


Speakers

Wednesday July 12, 2017 11:30am - 12:00pm
Bolden 2

11:30am

Deploying RMACC Summit: An HPC Resource for the Rocky Mountain Region
RMACC Summit is a heterogeneous supercomputer cluster with an aggregate floating point performance of 379 TFLOPS (Rmax, as currently configured) that provides about 85 million core-hours/yr to researchers from institutions participating in the Rocky Mountain Advanced Computing Consortium (RMACC) . The development of Summit was a collaborative effort toward specifying a system that meets the needs of researchers at multiple universities, and included implementation and testing of several new technologies. We discuss our experiences in creating and maintaining a successful ongoing collaboration between the two universities that are RMACC Summit's primary operators, and consider both the technical and support challenges of extending that collaboration to other regional users.


Wednesday July 12, 2017 11:30am - 12:00pm
Strand 11

11:40am

The Impacts of Outreach Efforts by Research Computing at the University of Colorado Boulder
An evaluation of services offered by Research Computing at the University of Colorado Boulder is presented. This evaluation is attempting to determine the impact of services on existing users, as well as how to introduce new users to HPC resources offered by the group. Preliminary results indicate changes informed by user input have been successful.


Wednesday July 12, 2017 11:40am - 12:00pm
Bolden 6

12:00pm

Introducing Protein 3-D Visualization Software to Freshman Undergraduate Students: Making Connections and Building Skills
The structure and dynamics of proteins are an essential part of understanding the molecular foundations of complex biological processes and serve an important role in the field of computational biology. Biomolecular visualization software can serve as an entryway to begin exploration of, and expose students to, protein structure-function relationships and aid in their development in basic science knowledge. In addition, technical skills and effective writing and presentation of scientific material are critical for students entering the field of bioinformatics and computational biology. Training and education utilizing biomolecular visualization software and honing writing and presentation skills are often reserved for special studies or higher-level coursework. Presenting more advanced concepts and skills that can connect ideas from introductory level classes in chemistry, biology, and physics earlier in the curriculum is imperative to success in advanced classes and application in research settings for undergraduate students. Thus, student-centered activities that can satisfy the development of new skills and critical thinking, in addition to computational knowledge and practice, provide foundational principles for developing future scientists and increasing their chances of success. By designing a one-credit hour, introduction to biomolecular visualization course, freshman biochemistry undergraduate students were exposed to higher-level thinking and application, and gained skills in biomolecular visualization and scientific presentation. In addition, biomolecular visualization and molecular modeling were used to introduce the students to the field of computational biology and computational skills such as command-line usage, unix interfaces, and parallel computing. Overall, this study discusses the benefits of introducing bimolecular visualization software early in the undergraduate education curriculum and the potential for implementation on a larger scale in order to prepare students by providing discipline-specific foundations relevant to the use of computational tools.


Wednesday July 12, 2017 12:00pm - 12:20pm
Bolden 6

12:00pm

Extracting, Assimilating, and Sharing the Results of Image Analysis on the FSA/OWI Photography Collection
This paper reports on the continued work on image analysis of the Farm Security Administration – Office of War Information Photography Collection team, supported through an XSEDE grant (Extreme Science and Engineering Discovery Environment) and Extended Collaborative Support Service (ECSS). The team is refining existing algorithms and developing new algorithms and running them on the Comet supercomputer to analyze the FSA-OWI corpus from 1935-1944, held by the Library of Congress (LOC). The project spans many fields within the humanities and beyond, including photography, art, visual rhetoric, linguistics, American history, anthropology, and geography, as well as appealing to the general public. Progress includes refining image, metadata, and lexical semantics analysis, as well as developing a search, retrieval, and sorting interface through Clowder, which will serve as the public portal. Methods and tool refinement for this project are suitable for use on other large image corpora.


Wednesday July 12, 2017 12:00pm - 12:30pm
Bolden 5

12:00pm

Convergent CFD:
Speakers

Wednesday July 12, 2017 12:00pm - 12:30pm
Bolden 2

12:00pm

Building Community Informed and Driven Data Services at the National Center for Atmospheric Research
At the National Center for Atmospheric Research (NCAR), being able to enable and facilitate scientific progress by providing advanced computing capabilities has been a key focus for multiple decades. Additional developments of dedicated technical infrastructures for supporting data-related activities, such as visualization, analysis, processing, and transformation, have also been prioritized. However, in order to meet the diverse existing and emerging data needs from the scientific community, data services must couple technical development with human and organizational services. Data services have a better chance to be accepted and used by the target community if the community members are represented and consulted during the design and implementation process. This paper discusses NCAR’s Data Stewardship Engineering Team (DSET), which was created to facilitate cross-organizational participation and communication. The NCAR DSET includes representatives and stakeholders from the NCAR science and technical divisions to leverage the expertise of both groups, and ensure that feedback from users is built into the process. The goal of DSET is to enable data services, including the technical infrastructures, to be developed in alignment with the community’s research and data needs.


Wednesday July 12, 2017 12:00pm - 12:30pm
Strand 12

12:00pm

Stampede 2: The Evolution of an XSEDE Supercomputer
The Stampede 1 supercomputer was a tremendous success as an XSEDE resource, providing more than eight million successful computational simulations and data analysis jobs to more than ten thousand users. In addition, Stampede 1 introduced new technology that began to move users towards many core processors. As Stampede 1 reaches the end of its production life, it is being replaced in phases by a new supercomputer, Stampede 2, that will not only take up much of the original system’s workload, but continue the bridge to technologies on the path to exascale computing. This paper provides a brief summary of the experiences of Stampede 1, and details the design and architecture of Stampede 2. Early results are presented from a subset of Intel Knights Landing nodes that are bridging between the two systems.


Wednesday July 12, 2017 12:00pm - 12:30pm
Strand 11

12:30pm

Lunch
Wednesday July 12, 2017 12:30pm - 2:00pm
Empire AB

12:30pm

Student Mentor Lunch
Wednesday July 12, 2017 12:30pm - 2:00pm
Imperial 5AB (Level 4)

2:00pm

Value Analytics: A Financial Module for the Open XDMoD Project
Understanding the value of campus-based cyberinfrastructure (CI) to the institutions that invest in such CI is intrinsically difficult. Given today's financial pressures, administrative support for campus-based CI centers offering resources to local campus users is under constant budgetary pressure. This is partly due to the difficulty in obtaining quantitative metrics that clearly demonstrate the utility of investment in campus CI centers in enhancing scientific research and the financial aspects of enhanced competitive ability in seeking funding for research. We propose here the addition of a new realm of metrics to the standard cyberinfrastructure tool Open XDMoD (XD Metrics on Demand) that will allow us to correlate usage of high performance computing with funding and publications. The modules to be added will allow CI centers to view metrics relevant to both scientific output in terms of in publications, and financial data in terms of awarded grants.


Wednesday July 12, 2017 2:00pm - 2:30pm
Bolden 5

2:00pm

Data in Science Technologies (DST): Sharpening Your Scalpel: How to get the most from your cluster

The topic is broken into three components centered around efficiency in: data lifecycle, job scheduling and operational management. Through a series of tools and processes, DST will show the research community how to more effectively utilize the HPC environment resources.


Wednesday July 12, 2017 2:00pm - 2:30pm
Bolden 2

2:00pm

Data Access for LIGO on the OSG
During 2015 and 2016, the Laser Interferometer Gravitational-Wave Observatory (LIGO) conducted a three-month observing campaign. These observations delivered the first direct detection of gravitational waves from binary black hole mergers. To search for these signals, the LIGO Scientific Collaboration uses the PyCBC search pipeline. To deliver science results in a timely manner, LIGO collaborated with the Open Science Grid (OSG) to distribute the required computation across a series of dedicated, opportunistic, and allocated resources. To deliver the petabytes necessary for such a large-scale computation, our team deployed a distributed data access infrastructure based on the XRootD server suite and the CernVM File System (CVMFS). This data access strategy grew from simply accessing remote storage to a POSIX-based interface underpinned by distributed, secure caches across the OSG.


Wednesday July 12, 2017 2:00pm - 2:30pm
Bolden 6

2:00pm

Interactive Code Adaptation Tool for Modernizing Applications for Intel Knights Landing Processors
The process of code adaptation to take advantage of the latest innovations in a supercomputing platform begins with learning about the details of the platform’s underlying hardware. It can be challenging for many users to spend time and effort in developing an understanding of the innovative features in a supercomputing platform - such as deep memory hierarchies - and to harness their maximum possible performance by manually modernizing their applications. To mitigate the aforementioned challenge, we are developing an Interactive Code Adaptation Tool (ICAT). In its current form, ICAT can assist the users in modifying, compiling, and optimally running their applications on the latest HPC platforms that are equipped with the Intel Knights Landing (KNL) processors. ICAT detects a given application’s characteristics such as memory usage pattern, type of memory allocation, and execution time. Depending upon the application’s characteristics, it advises the user on optimal ways to take advantage of the KNL processor and its memory-hierarchy.


Wednesday July 12, 2017 2:00pm - 2:30pm
Strand 12

2:00pm

A real-time machine learning and visualization framework for scientific workflows
High-performance computing resources are currently widely used in science and engineering areas. Typical post-hoc approaches use persistent storage to save produced data from simulation, thus reading from storage to memory is required for data analysis tasks. For large-scale scientific simulations, such I/O operation will produce significant overhead. In-situ/in-transit approaches bypass I/O by accessing and processing in-memory simulation results directly, which suggests simulations and analysis applications should be more closely coupled. This paper constructs a flexible and extensible framework to connect scientific simulations with multi-steps machine learning processes and in-situ visualization tools, thus providing plugged-in analysis and visualization functionality over complex workflows at real time. A distributed simulation-time clustering method is proposed to detect anomalies from real turbulence flows.


Wednesday July 12, 2017 2:00pm - 2:30pm
Strand 11

2:00pm

ARCC: Advanced Systems, Technology, Storage, and Network Topics
  • Peter Ruprecht, CU Boulder—Standing up the Summit and integrating GPFS with OPA
  • Amit Chourasia, SDSC—SeedMe: Data sharing building blocks.
  • Thomas Cheatham, U Utah—NIH S10 equipment proposal challenges and standing up protected environments for research computing on restricted data
  • Bob Freeman, Harvard U—Challenges in engaging the social and business sciences
  • Christina Koch, U Wisconsin—Providing basic computing skills to researchers and building a campus community around good computing practices
  • Henry Neeman, U Oklahoma—The ACI-REF Virtual Residency: An Update on Training and Workforce Development for Research Computing Facilitators


Wednesday July 12, 2017 2:00pm - 3:30pm
Strand 13

2:30pm

Performance of image matching in the Computational Anatomy Gateway: CPU and GPU implementations in OpenCL
The Computational Anatomy Gateway is a software as a service that provides tools for analysis of structural MRI to the neuroimaging community by calculating diffeomorphic mappings between a user's data and well characterized atlas images. These tools include automatic parcellation of brain images into labeled regions, described by dense 3D arrays; and shape analysis of regions described by triangulated surfaces, for hypothesis testing in specific populations. We have developed mapping techniques that combine the benefits of working with triangulated surfaces with those of working with dense images, and have been working toward uniting these two tools: to automatically perform shape analysis on each segmented subcortical structure simultaneously.

In this work we investigate the performance of our algorithm across a wide range of input data, examining the effect of number of voxels in 3D images, number of vertices in triangulated surfaces, and number of structures being mapped onto simultaneously. Further, we investigate the performance of our OpenCL code implemented in two different environments: the Intel OpenCL environment on a CPU, and the CUDA OpenCL environment on a GPU.

We identify a range of inputs, generally smaller datasets, for which the CPU out performs the GPU. Finally we show the feasibility of mapping onto all the human gray matter subcortical structures simultaneously, and discuss our strategy for extending to higher resolution images and more labeled structures in mouse brain imaging at the micrometer scale.


Wednesday July 12, 2017 2:30pm - 3:00pm
Bolden 5

2:30pm

BioTeam: Advanced Research IT Solutions for Large-Scale Scientific Discoveries in Life Sciences

The genomic revolution has led to a significant drop in the cost of sequencing of entire organisms including humans. This has led to a better understanding of the basic building blocks of life, diseases and increased medical knowledge. In turn, this has resulted in a plethora of new diagnostics tests and drugs. In particular advances in medical technologies such as medical imaging, molecular modeling and therapeutic devices have led to an exponential growing data deluge in the life sciences. As a result, organizations are spending millions of dollars on IT infrastructure and IT support both on-premises and in the Cloud to store, analyze and access this data. Many of these organizations lack a dedicated research IT infrastructure and scientific IT support, their existing enterprise IT is ill prepared to deal with scientific workflows resulting in a large gap between science and IT. This is because IT has traditionally been the group that manages only the business and enterprise systems for an organization such as desktop support, email, web-services, HR, or databases. The consequence of this lack of a dedicated research IT within an organization is impacting the research of scientists and labs at different levels, from day to day IT related activities, to advanced scientific computing needs for large-scale analytics, to collaboration and data exchange with peers both within and outside of the organization.


Speakers

Wednesday July 12, 2017 2:30pm - 3:00pm
Bolden 2

2:30pm

Globus: Research Data Management as Service and Platform
Scientists have embraced the use of specialized cloud-hosted services to perform data management operations. Globus offers a suite of data and user management capabilities to the community, encompassing data transfer and sharing, user identity and authorization, and data publication. Globus capabilities are accessible via both a web browser and REST APIs. Web access allows Globus to address the needs of research labs through a software-as-a-service model; the newer REST APIs address the needs of developers of research services, who can now use Globus as a platform, outsourcing complex user and data management tasks to Globus cloud-hosted services. Here we review Globus capabilities and outline how it is being applied as a platform for scientific services.


Wednesday July 12, 2017 2:30pm - 3:00pm
Bolden 6

2:30pm

Modernizing GooFit: A Case Study
The GooFit highly parallel fitting framework is a system to enable fast computation on massive datasets common in high energy physics. The system was primarily a research level code, difficult to install and with separate code bases for every analysis. Moving this code into a production ready state as a community code where developments can be easily incorporated has presented several challenges. This is a case study for moving disorganized code into a modern software engineering environment.

Speakers

Wednesday July 12, 2017 2:30pm - 3:00pm
Strand 12

2:30pm

GenApp Integrated with OpenStack Supports Elastic Computing on Jetstream
GenApp is a universal and extensible tool for rapid deployment of applications. GenApp builds fully functioning science gateways and standalone GUI applications from collections of definition files and libraries of code fragments. Among the main features are the minimal technical expertise requirement for the end user and an open-end design ensuring sustainability of generated applications. Because of the conceptual simplicity of use, GenApp is ideally suited to scientists who are not professional developers, to disseminate their theoretical and experimental expertise as embodied in their code to their communities by rapidly deploying advanced applications. GenApp has an open extensible resource execution model. To support efficient elastic cloud computing on NSF Jetstream, GenApp has recently integrated OpenStack as a target resource with optional job-specific XSEDE project accounting.


Wednesday July 12, 2017 2:30pm - 3:00pm
Strand 11

3:00pm

Comet - Tales from the Long Tail. Two Years In, and 10,000 Users Later
The Comet petascale system (ACI #1341698) was put into production as an XSEDE resource in early 2015 with the goal of serving a much larger user community than HPC systems of similar size. The Comet project set an audacious goal of reaching over 10,000 users in its 4-years of planned operation, a goal which has now been achieved in less that two years, due in large part to the adoption of science gateways, and as a result of policies that favor smaller allocations, and thus encourage more users. Here we describe our experiences in operating and supporting this system, highlight some of the important science that has been enabled by Comet, and provide some practical lessons we have learned by operating a system designed for the long tail.


Wednesday July 12, 2017 3:00pm - 3:30pm
Bolden 5

3:00pm

HPE: Bridging to Next-Gen Supercomputing

An overview of the architecture and scientific uses of the HPC “Bridges” system which resulted from a partnership between Pittsburgh Supercomputing Center (PSC) and Hewlett-Packard Enterprise (HPE).  Bridges is a uniquely capable resource for empowering new research communities and bringing together HPC and Big Data.



Wednesday July 12, 2017 3:00pm - 3:30pm
Bolden 2

3:00pm

Live Integrated Visualization Environment: An Experiment in Generalized Structured Frameworks for Visualization and Analysis
Many immersive visualization systems require custom coding and specialized hardware and software to function properly. A num- ber of commercial products provide some of this functionality, but lack full immersive tracking and controls. Additionally, these envi- ronments have been traditionally limited for real time data feeds and analysis. The Live Integrated Visualization Environment is a framework developed to address these limitations, while allowing for best of breed integration of commercial products, government software, and open source software. By combining a custom devel- oped messaging bus with a web service implementation, a dynamic, interactive immersive environment is provided across a number of platforms including CAVEs, touch tables, single wall displays, and desktops. We provide an architecure discussion including driver capabilities added to the system to enable quick development of additional data sources with existing visualization applications. We conclude with a discussion of several projects that successfully utilize the framework for real-time big data and geospatial applications for a range of tasks.


Wednesday July 12, 2017 3:00pm - 3:30pm
Strand 12

3:00pm

Portable learning environments for hands-on computational instruction: Using container- and cloud-based technology to teach data science
There is an increasing interest in learning outside of the traditional classroom
setting, especially for instruction of computational tools and practices that
are challenging to incorporate in the standard curriculum. These atypical
learning environments offer new ways for teaching students skills and concepts,
particularly when it comes to combining conceptual knowledge with hands-on
methods. Advances in cloud computing and containerized environments provide an
attractive opportunity to improve the efficiency and ease with which students
can learn. This manuscript details recent advances towards using
commonly-available cloud computing services and advanced cyberinfrastructure
support for improving learning experiences in bootcamp-style events. We cover
the benefits (and challenges) of using a server hosted remotely instead of
relying on student laptops, discuss the technology that was used in order to
make this possible, and give suggestions for how others could implement and
improve upon this model for pedagogy and reproducibility.


Wednesday July 12, 2017 3:00pm - 3:30pm
Bolden 6

3:00pm

XSEDE Technology Investigation Service (TIS)
The Evaluating and Enhancing the eXtreme Digital Cyberinfrastructure for Maximum Usability and Science Impact project, known currently as the Technology Investigation Service (TIS), was a collaboration between University of Illinois at Urbana-Champaign National Center for Supercomputing Applications, Pittsburgh Supercomputing Center, The University of Texas at Austin Texas Advanced Computing Center, University of Tennessee National Institute for Computational Sciences, and University of Virginia which identified and evaluated potential technologies to close the gap between the XSEDE (Extreme Science and Engineering Discovery Environment, http://www.xsede.org) service offerings and the needs of XSEDE’s users. This project was funded by the NSF Division of Advanced Cyberinfrastructure (award ACI 09-46505) in response to the “Technology Audit and Insertion Service” component of the “TeraGrid Phase III: eXtreme Digital Resources for Science and Engineering (XD)” solicitation (NSF 08-571). Over the project lifetime the two major goals of TIS were: 1) identifying, tracking, evaluating and making recommendations of new technologies to XSEDE for consideration of adoption and 2) raising awareness of TIS to XSEDE and other stakeholders to solicit their input on technologies for consideration for evaluation. In accomplishing the goals, the following four significant outcomes from TIS resulted: the development and deployment of the XSEDE Technology Evaluation Database; the development of the significantly improved XSEDE software search capability; the technology evaluation process; and the evaluations performed along with their corresponding technology adoption recommendations to XSEDE. This paper highlights the life-cycle of the TIS project, including lessons learned and project outcomes.

Speakers
avatar for John Towns

John Towns

Director of Collaborative eScience Programs, National Center for Supercomputing Applications
Director of Collaborative eScience Programs at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NCSA's Collaborative eScience Directorate works to reduce barriers to the use of cyberinfrastructure so more people in more fields of inquiry can take full advantage of powerful digital resources and services. This unit leads NCSA's participation in the Extreme Science and Engineering Discovery Environment (XSEDE) and the University of Illinois Campus Cluster Program. | He was named a... Read More →


Wednesday July 12, 2017 3:00pm - 3:30pm
Strand 11

3:30pm

Afternoon Break
Wednesday July 12, 2017 3:30pm - 4:00pm
Strand Foyer

4:00pm

4:00pm

Best Practices in Science Gateway Job Management
"As the number of science gateways expand, and use campus as well as national resources, campuses too are reaping benefits from gateways while facing challenges. The Ohio Supercomputer Center (OSC) has operated a WebMO service for nearly ten years. The service is used for some research but predominantly for teaching, serving 500 students last academic year and 600 students this year. While widespread impact like this makes a very good story, challenges arise in both job management and user administration.

Despite the success of these organizations, for many, job management remains as much a challenge today as it was when the first gateways surfaced roughly two decades ago. Workflow management particularly restarting workflows with user intervention to drive a simulation cascade is still a challenge. Across the board, issues such as resource scarcity, security, co/scheduling, identity & environment management, reporting, data handling and transfer, monitoring, and quota enforcement are no longer exclusively the concern of virtual organizations and resource providers, they now impact any gateway trying to ensure a baseline quality of service for their users.

Is there a silver bullet for successful job management in science gateways? Probably not, but both the process and solution can be improved through awareness of battle-tested best practices, open dialog between resource consumers and resource providers, and leveraging the appropriate open source tooling for your project.

The BOF will be organized as an open forum for community discussion with possibility of short presentations about common topic threads and best practices. Questions will be encouraged from audience members, and over Twitter, both in advance and during the BOF. "


Wednesday July 12, 2017 4:00pm - 5:00pm
Bolden 2

4:00pm

Managing the User Environment: Opportunities and Challenges
This BoF is about managing complexity of the software stack using modules. It is for admin and other support staff who are in the trenches including those who manage campus and department resources. It is especially for those who might be outgrowing their current approach. Please bring your lessons learned, questions, war stories and favorite tricks. This will be a place where we will discuss our progress and best practices, articulate additional needs and perhaps identify new opportunities.

Speakers

Wednesday July 12, 2017 4:00pm - 5:00pm
Bolden 6

4:00pm

Online Course Learning Strategies
There have been numerous reports over the last 30 years that document the critical need for a larger and more diverse computational, data-enabled and HPC workforce in all fields of study across all sectors of society. There are many efforts underway to provide people with opportunities to learn topics in these fields that are not normally available at their own institutions. These informal learning approaches do not address the basic challenges that these opportunities are not offered at far too many institutions, course curricula are not keeping pace with technological advancements, and far too many students lack the formal education they need to be competitive in the HPC workforce. This BoF will provide a platform for discussion of effective online educational strategies and what is needed to scale up and improve on past efforts.


Wednesday July 12, 2017 4:00pm - 5:00pm
Bolden 5

4:00pm

Open OnDemand: Current Status and Future Plans
The Open OnDemand Project is an open-source software project, based on the proven OSC OnDemand platform, to allow HPC centers to install and deploy advanced web and graphical interfaces for their users. The Open OnDemand team is completing the second year of the project and continuously releasing additional features. Upcoming releases will include support for VDI, custom application development, nested web application servers, and workflow features. This BoF will provide a platform for discussion of these upcoming features as well as spur discussion on developing the community of users around Open OnDemand.


Wednesday July 12, 2017 4:00pm - 5:00pm
Strand 11

4:00pm

PEARC Town Hall: Community input for the future of the PEARC Conference Series
All PEARC17 attendees are invited to help inform the direction for the PEARC conference series during this town hall, hosted by the members of the PEARC Steering Committee. As the inaugural conference takes the first steps to bring together a wide range of stakeholders, the Steering Committee is actively seeking input about how the conference can best serve the community's needs. Your feedback at this town hall can help set the tone, theme, and focus for future PEARC conferences.

Speakers
avatar for David Hart

David Hart

User Services Section Manager, National Center for Atmospheric Research
SS

Sergiu Sanielevici

Pittsburgh Supercomputing Center
avatar for John Towns

John Towns

Director of Collaborative eScience Programs, National Center for Supercomputing Applications
Director of Collaborative eScience Programs at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NCSA's Collaborative eScience Directorate works to reduce barriers to the use of cyberinfrastructure so more people in more fields of inquiry can take full advantage of powerful digital resources and services. This unit leads NCSA's participation in the Extreme Science and Engineering Discovery Environment (XSEDE) and the University of Illinois Campus Cluster Program. | He was named a... Read More →


Wednesday July 12, 2017 4:00pm - 5:00pm
Strand 12

4:00pm

5:30pm

Visualization Showcase and Reception
Join us for the final evening of PEARC17 with a reception featuring the Visualization Showcase submissions and live music! You can purchase tickets to have family and friends join you at this event when you register.  The Visualization Showcase will feature some of the most computationally intensive simulations that have recently come from HPC systems.  Attendees will be able to vote for the most compelling visualization, which will be awarded at the end of the conference. 

The Visualization Showcase entries will be:
  • CESM Wind Speed Magnitude — Matt Rehme, NCAR
  • Visualization of Physical Signatures of Cancer Metastasis — Anne Bowen, Abdul N Malmi-Kakkada, University of Texas, Austin
  • Visualization of NCAR's Wind Forecast Improvement Project 2 — Scott Pearse, Pedro Jimenez, NCAR
  • Pillars of the Mantle: Imaging the Interior of the Earth with Adjoint Tomography — David Pugmire, Ebru Bozdag, ORNL
  • Visualization of Tropical Cyclone-Ocean Interactions — David Bock, NCSA
  • Gonzalo and Sandy — Greg Foss, Fuqing Zhang, TACC
  • Spot the Difference — Greg Foss, Amy McGovern, TACC

Wednesday July 12, 2017 5:30pm - 7:00pm
Empire Foyer
 
Thursday, July 13
 

7:30am

Registration
Conference registration and information

Thursday July 13, 2017 7:30am - 12:00pm
2nd Floor Registration Area (behind escalators)

8:00am

Breakfast
Thursday July 13, 2017 8:00am - 9:00am
Empire AB

9:00am

Simulating Multiphase Flows in Porous Media Using OpenFOAM on Intel Xeon Phi Knights Landing Processors
The multiphase flow in porous media is important in the research for petroleum reservoirs1. During the processing of oil exploration, the gas, liquid and solid particles may flow through the porous media in the reservoir. A novel solver, named as MPPICmultiphaseInterFoam, was developed using OpenFOAM to simulate multiphase flows in the porous media of a reservoir. This solver is realized by the coupling of DPM (discrete particle modeling) and VOF (volume of fluid) with CFD (computational fluid dynamics) based on MP-PIC (multiphase particle-in-cell) method. After validation, this solver was used to simulate the multiphase flows in oil and gas reservoirs. The Intel Xeon Phi Knights Landing (KNL) processors on an HPC system were employed to carry out the numerical simulations. Optimal performance was realized by parallel programming with MPI and vectorization based on the Intel Xeon Phi KNL processors. It was found that Intel Xeon Phi Knights Landing processors are good to perform the large-scale simulations of multiphase flow in porous media.


Thursday July 13, 2017 9:00am - 9:30am
Bolden 5

9:00am

Apache Airavata Sharing Service: A Tool for Enabling User Collaboration in Science Gateways
Science Gateways provide user environments and a set of supporting services that help researchers make effective and enhanced use of a diverse set of computing, storage, and related resources. Gateways provide the services and tools users require to enable their scientific exploration, which includes tasks such as running computer simulations or performing data analysis. Historically gateways have been constructed to support the workflow of individual users, but collaboration between users has become an increasingly important part of the discovery process. This trend has created a driving need for gateways to support data sharing between users. For example, a chemistry research group may want to run simulations collaboratively, analyze experimental data or tune parameter studies based on simulation output generated by peers, whether as a default capability, or through explicit creation of sharing privileges. As another example, students in a classroom setting may be required to share their simulation output or data analysis results with the instructor. However most existing gateways (including the popularly used XSEDE gateways SEAGrid, Ultrascan, CIPRES, and NSG), do not support direct data sharing, so users have to handle these collaborations outside the gateway environment. Given the importance of collaboration in current scientific practice, user collaboration should be a prime consideration in building science gateways. In this work, we present design considerations and implementation of a generic model that can be used to describe and handle a diverse set of user collaboration use cases that arise in gateways, based on general requirements gathered from the SEAGrid, CIPRES, and NSG gateways. We then describe the integration of this sharing service into these gateways. Though the model and the system were tested and used in the context of Science Gateways, the concepts are universally applicable to any domain, and the service can support data sharing in a wide variety of use cases.


Thursday July 13, 2017 9:00am - 9:30am
Strand 12

9:00am

A Platform for Computationally Advanced Collaborative AgroInformatics Data Discovery and Analysis
The International Agroinformatics Alliance (IAA) is a coalition of public and private institutions that are cooperating to develop a platform for computationally advanced collaborative analysis of agricultural data. By combining large agricultural data sets with advanced analysis techniques, IAA seeks to catalyze agricultural research, leading to improved agricultural productivity and stability. IAA has constructed a platform that combines Jupyterhub web notebooks for interactive data analysis, relational databases for storage of crop genetic and geospatial data, and the Globus file transfer system for efficient data transfer and authentication. The platform uses a data permissions system that allows users to share data with collaborators. The central platform is located at the Minnesota Supercomputing Institute, at the University of Minnesota, which allows access to the large storage and compute resources required for advanced agroinformatics analysis pipelines.


Thursday July 13, 2017 9:00am - 9:30am
Strand 11

9:00am

Panel: National-Scale Research Computing and Beyond
How might the international community of research computing users and stakeholders benefit from knowledge sharing among national- or international-scale research computing organizations and providers? It is common for large-scale investments in research computing systems, services and support to be guided and funded with government oversight and centralized planning. There are many commonalities, including stakeholder relations, outcomes reporting, long-range strategic planning, and governance. What trends exist currently, and how might information sharing and collaboration among resource providers be beneficial? Is there desire to form a partnership, or to build upon existing relationships? Participants in this panel will include personnel involved in US, Canadian, European, and other large-scope research computing jurisdictions.

Moderators
avatar for Gregory Newby

Gregory Newby

Chief Technology Officer, Compute Canada
A strategic thinker with passion for enabling diverse scientific, social and educational opportunities for all people. Devoted to the expansion of human intellect and capability through use of information and computing technologies.

Speakers
avatar for Florian Berberich

Florian Berberich

Dr. Florian Berberich is the project manager of the PRACE Implementation Phase projects. Since October 2015 he became a member of the Board of Directors of PRACE aisbl. He is working for the PRACE Project Management Office at Forschungszentrum Juelich - JSC, since 2008. He finish... Read More →
avatar for Gergely Sipos

Gergely Sipos

Customer and Technical Outreach Manager, EGI Foundation
I work at the coordinator institute of the EGI e-infrastructure, the largest publicly-funded grid and cloud computing infrastructure of the world (600k CPUs at 300 sites). My role is Customer and Technical Outreach Manager of the EGI federation. I coordinate EGI’s research enga... Read More →
avatar for John Towns

John Towns

Director of Collaborative eScience Programs, National Center for Supercomputing Applications
Director of Collaborative eScience Programs at the National Center for Supercomputing Applications (NCSA) at the University of Illinois. NCSA's Collaborative eScience Directorate works to reduce barriers to the use of cyberinfrastructure so more people in more fields of inquiry can take full advantage of powerful digital resources and services. This unit leads NCSA's participation in the Extreme Science and Engineering Discovery Environment (XSEDE) and the University of Illinois Campus Cluster Program. | He was named a... Read More →


Thursday July 13, 2017 9:00am - 10:30am
Strand 13

9:00am

Workshop on Trustworthy Scientific Cyberinfrastructure (Part 1)
Open to all PEARC17 attendees.

9-9:30 a.m. — 
An Update from the NSF Cybersecurity Center of Excellence (CCoE) 


The Center for Trustworthy Scientific Cyberinfrastructure (CTSC) has been leading the NSF community since 2012 in understanding and implementing risk-based cybersecurity to maximize trustworthy science in the context of the over seven billion dollars of research infrastructure funded by NSF. As of the beginning of 2016, CTSC has been recognized and funded by NSF as a Cybersecurity Center of Excellence (CCoE) to "provide leadership to the NSF research community in the continuous building and distribution of a body of knowledge on the topic of trustworthy cyberinfrastructure." This talk covers the principles of cybersecurity and identity management to support scientific research developed by CTSC and its work as a center of excellence, including developing a threat model and cybersecurity best practices for science, providing situational awareness to the NSF community, engaging with science projects one-on-one to collaboratively address their cybersecurity challenges, and building a community of practitioners and researchers around cybersecurity for science anchored by an annual cybersecurity summit for NSF cyberinfrastructure. CTSC provides training, best practices, and support across a diverse set of cybersecurity topics including cybersecurity program development, incident response, software assurance, and federated identity management in response to the needs of NSF cyberinfrastructure projects.

9:30-10:30 am — Cybersecurity for Small and Medium Science Projects

Based on CTSC's cybersecurity program development guide, this presentation covers practical information security tasks for small and medium science projects, recognizing that cybersecurity is not a one-size-fits-all endeavor. Some of the topics covered include:
  1. Cybersecurity's relevance to science projects.
  2. The complexity and scope of cybersecurity, and how cybersecurity programs can help you cope with that complexity (and protect your science).
  3. A handful of "must-do" (and doable!) action items.
This session is appropriate for principal investigators, program officers, IT professionals in research and higher education, research facility managers, and security professionals interested in information security approaches tailored to particular communities. It is not a detailed technical training. There will be significant opportunities for Q&A. See also: http://hdl.handle.net/2022/21260



Thursday July 13, 2017 9:00am - 10:30am
Bolden 2

9:00am

Overcoming Bias in the Workplace — Workshop Part 1
Meeting your workforce development goals in HPC will require cultural shifts and transformation of your recruiting and retention strategies. Overcoming bias is a challenge for any organization. Join our workshop with presenter Kim Stephens of IBM to learn how overcome unconscious and conscious bias in the workplace.

Speakers

Thursday July 13, 2017 9:00am - 10:30am
Bolden 6

9:30am

Experiences Porting Scientific Applications to the Intel (KNL) Xeon Phi Platform
This paper presents experiences using Intel’s KNL MIC platform on
early-access hardware for the upcoming Stampede 2 cluster launching
in Summer 2017. We focus on 1) porting of existing scientific
software; 2) observing performance of this software. Additionally,
we comment on both the ease of use of KNL and observed performance
of KNL as compared to previous generation “Knights
Ferry” and “Knights Corner” Xeon Phi MICs. Fortran, C, and
C++ applications are chosen from a variety of scientific disciplines
including computational fluid dynamics, numerical linear algebra,
uncertainty quantification, finite element methods, and computational
chemistry.


Thursday July 13, 2017 9:30am - 10:00am
Bolden 5

9:30am

DesignSafe: Using Elasticsearch to Share and Search Data on a Science Web Portal
Designsafe is a web portal focused on helping Natural Hazards Engineering to conduct research. Natural Hazards research spans across multiple physical locations, where the experiments take place, and multiple disciplines. Sharing and searching data is an imperative feature when doing research in multiple physical locations. We are able to handle the research needs by using a distributed database (Elasticsearch) to index important features extracted from data.


Thursday July 13, 2017 9:30am - 10:00am
Strand 12

9:30am

Shared research group storage solution with integrated access management
Management of research data storage areas for projects and research groups has become a time consuming task for research and data support staff. Purdue Research Computing staff have developed a web-based system to allow quick provisioning of group storage areas on the Research Data Depot shared filesystem operated by Purdue's Research Computing group. The system allows technical and non-technical staff to provision Research Data Depot storage allocations and configure data access management for research groups quickly, consistently, and reliably via an intuitive web interface. This system also allows for tracking allocation usage of these storage spaces and customizable alerting of the users of these spaces when allocations are nearly consumed.


Thursday July 13, 2017 9:30am - 10:00am
Strand 11

10:00am

HPC-enabled food-water-energy system simulations: Simulation of Intensively Managed Landscapes

Domain science experts are commonly limited by computational efficiency of their code and hardware resources available for execution of desired simulations. Here, we detail a collaboration between domain scientists focused on simulating an ensemble of climate and human management decisions to drive environmental (e.g., water quality) and economic (e.g., crop yield) outcomes. Briefly, the domain scientists developed a message passing interface to execute the formerly serial code across a number of processors, anticipating significant performance improvement by moving to a cluster computing environment from their desktop machines. The code is both too complex to efficiently re-code from scratch and has a shared codebase that must continue to function on desktop machines as well as the parallel implementation. However, inefficiencies in the code caused the LUSTRE filesystem to bottleneck performance for all users. The domain scientists collaborated with Indiana University’s Science Applications and Performance Tuning and High Performance File System teams to address the unforeseen performance limitations. The non-linear process of testing software advances and hardware performance is a model of the failures and successes that can be anticipated in similar applications. Ultimately, through a series of iterative software and hardware advances the team worked collaboratively to increase performance of the code, cluster, and file system to enable more than 100-fold increases in performance. As a result, the domain science is able to assess ensembles of climate and human forcing on the model, and sensitivities of ecologically and economically important outcomes of intensively managed agricultural landscapes.



Thursday July 13, 2017 10:00am - 10:30am
Bolden 5

10:00am

Flexible Enforcement of Multi-factor Authentication with SSH via Linux-PAM for Federated Identity Users
A computational science project with restricted-access data was awarded an allocation by XSEDE in 2016 to use the Bridges supercomputer at the Pittsburgh Supercomputing Center (PSC). As a condition of the license agreement for access to the data, multi-factor authentication (MFA) with XSEDE's Duo MFA service is required for users of this project to login to Bridges via SSH, in addition to filesystem access controls. Since not all Bridges users are required to authenticate to Bridges in this manner, a solution was implemented via Linux-PAM to require XSEDE Duo MFA for SSH login access by specific users, as identified by their local account name or membership in a local group. This paper describes the implementation on Bridges and its extensibility to other systems and environments with similar needs.


Thursday July 13, 2017 10:00am - 10:30am
Strand 11

10:30am

Morning Break
Thursday July 13, 2017 10:30am - 11:00am
Empire Foyer

11:00am

Sandstone HPC - A Domain-General Gateway for New HPC Users
The complexity of high-performance computing (HPC) resources poses many challenges to new users. A number of science gateways have been developed to increase the productivity of novice users by hiding the underlying infrastructure, however these solutions tend not to teach HPC skills that transfer easily outside of the gateway. In this paper we introduce a domain-general gateway, Sandstone HPC, that represents the HPC environment more naturally to novice users by abstracting the command-line interface and providing contextual help. We assess the degree to which Sandstone HPC improves upon the usability of the command-line interface by analyzing the results of a usability study conducted on both environments. We will also detail how the architecture promotes long-term sustainability and a community-development model.


Thursday July 13, 2017 11:00am - 11:30am
Strand 12

11:00am

Performance Benchmarking of the R Programming Environment on the Stampede 1.5 Supercomputer
We present performance results obtained with a new single-node performance benchmark of the R programming environment on the many-core Xeon Phi Knights Landing and standard Xeon-based compute nodes of the Stampede supercomputer cluster at the Texas Advanced Computing Center. The benchmark package consists of microbenchmarks of linear algebra kernels and machine learning functionality that includes clustering and neural network training from the R distribution. The standard Xeon-based nodes outperformed their Xeon Phi counterparts for matrices of small to medium dimensions, performing approximately twice as fast for most of the linear algebra micro benchmarks. For most of the same microbenchmarks the Knights Landing compute nodes were competitive with or outperformed the standard Xeon-based nodes for matrices of medium to large dimensions, executing as much as five times faster than the standard Xeon-based nodes. For the clustering and neural network training microbenchmarks, the standard Xeon-based nodes performed up to four times faster than their Xeon Phi counterparts for many large data sets, indicating that commonly used R packages may need to be reengineered to take advantage of existing optimized, scalable kernels.


Thursday July 13, 2017 11:00am - 11:30am
Strand 11

11:00am

Visualization Showcase Panel
The Visualization Showcase panel will demonstrate some of the most computationally intensive simulations that have recently come from HPC systems.  Each guest on the panel will give a short talk about their visualization, the science behind it, and the techniques used to illustrate its observables. An open Q&A session for the panel will follow.
  • CESM Wind Speed Magnitude — Matt Rehme, NCAR
  • Visualization of Physical Signatures of Cancer Metastasis — Anne Bowen, Abdul N Malmi-Kakkada, University of Texas, Austin
  • Visualization of NCAR's Wind Forecast Improvement Project 2 — Scott Pearse, Pedro Jimenez, NCAR
  • Pillars of the Mantle: Imaging the Interior of the Earth with Adjoint Tomography — David Pugmire, Ebru Bozdag, ORNL
  • Visualization of Tropical Cyclone-Ocean Interactions — David Bock, NCSA
  • Gonzalo and Sandy — Greg Foss, Fuqing Zhang, TACC
  • Spot the Difference — Greg Foss, Amy McGovern, TACC


Moderators
Thursday July 13, 2017 11:00am - 12:30pm
Bolden 5

11:00am

Workshop on Trustworthy Scientific Cyberinfrastructure (Part 2)
Open to all PEARC17 attendees.

11-11:30 a.m. — Security for Science Gateways


Cybersecurity is a key part of sustainability for science gateways. This presentation provides background, motivation, and best practices on this topic, based on materials developed in partnership between the Center for Trustworthy Scientific Cyberinfrastructure (CTSC) and the Science Gateway Community Institute (SGCI). See also: http://hdl.handle.net/2022/21367

11:30 a.m.-12:30 p.m. — Community Forum

The Workshop on Trustworthy Scientific Cyberinfrastructure concludes with a Community Forum where attendees can share challenges and success stories in an informal setting. Representatives of the NSF Cybersecurity Center of Excellence (CCoE) will be on hand to lead the discussion and answer questions.


Thursday July 13, 2017 11:00am - 12:30pm
Bolden 2

11:00am

Student Modeling Day Presentations
The teams from Student Modeling Day will describe their work to all interested attendees.

Moderators
Thursday July 13, 2017 11:00am - 12:30pm
Strand 13

11:00am

Overcoming Bias in the Workplace — Workshop Part 2
Meeting your workforce development goals in HPC will require cultural shifts and transformation of your recruiting and retention strategies. Overcoming bias is a challenge for any organization. Join our workshop with presenter Kim Stephens of IBM to learn how overcome unconscious and conscious bias in the workplace.

Speakers

Thursday July 13, 2017 11:00am - 12:30pm
Bolden 6

11:30am

COSMIC2: A Science Gateway for Cryo-Electron Microscopy Structure Determination Using the CIPRES Workbench Framework and Globus Services for Terabyte-sized Data Transfer
Structural biology is in the midst of a revolution. Instrumentation and software improvements have allowed for the full realization of cryo-electron microscopy (cryo-EM) as a tool capable of determining atomic structures of protein and macromolecular samples. These advances open the door for solving new structures that were previously unattainable, which will soon make cryo-EM a ubiquitous tool for structural biology worldwide, serving both academic and commercial purposes. However, despite its power, new users to cryo-EM face significant obstacles. One major barrier consists of the handling of large datasets (10+ terabytes), where new cryo-EM users must learn how to interface with the Linux command line while also dealing with managing and submitting jobs to high performance computing resources. To address this barrier, we are developing the COSMIC2 Science Gateway as an easy, web-based, science gateway to simplify cryo-EM data analysis using a standardized workflow to run on XSEDE’s (Extreme Science and Engineering Discovery Environment) supercomputers. This gateway will lower the barrier to high performance computing tools and facilitate the growth of cryo-EM to become a routine tool for structural biology. With the support of XSEDE’s Extended Collaborative Support Services (ECSS) and the Science Gateway Community Institute’s (SGCI) Extended Developer Support (EDS), we have adapted the successful Cyberinfrastructure for Phylogenetic Research (CIPRES) Workbench to the cryo-EM analysis workflow and are in the process of adding Globus Auth and Globus Transfer to enable the transfer of hundreds of gigabytes to several terabytes of data for analysis at the San Diego Supercomputer Center (SDSC).


Thursday July 13, 2017 11:30am - 12:00pm
Strand 12

11:30am

Building Bridges - The System Administration Tools and Techniques used to Deploy Bridges
High Performance Computing is continually growing
in scope and areas of research. To cover these new
areas of research, HPC has to become more flexible to
handle the wide variety of workloads. As the
computing becomes more flexible, the infrastructure
becomes more complex to accommodate these new
varied workloads.
At Pittsburgh Supercomputing Center, a Level 1
XSEDE Service Provider, we faced this exact problem
for Bridges, our new NSF-funded configurable
computing resource. This talk will cover the technical
decisions that were made for Bridges, why we made
them, the tools we chose, and what the users gain. We
chose Openstack Ironic for system installation,
Openstack for managing virtual machines, Puppet for
configuration, Slurm for scheduling, Naemon, Elastic
Search / Logstash / Kibana and InfluxDB for
monitoring and reporting. This software gives
flexibility to give the users a wide range of ways to do
computing at PSC. Additionally, it gives the ability to
maintain an even higher level of monitoring and
reporting that changes automatically as the systems
change functionality.


Thursday July 13, 2017 11:30am - 12:00pm
Strand 11

12:00pm

The Community Software Repository from XSEDE
The Extreme Science and Engineering Discovery Environment (XSEDE) aims to be a connector of cyberinfrastructure (CI) resources, software, and services. By bringing together advanced digital infrastructure, expert support, and training services, XSEDE enables scholars, researchers, and engineers to participate in multidisciplinary collaborations while seamlessly accessing advanced computing resources and sharing data to tackle society’s grand challenges.
To realize this vision, XSEDE must both enable and participate in a software ecosystem, and must collectively engage with software developers, integrators, vendors, users, and funding agencies to gather requirements; develop, share, and deploy software tools; and provide software documentation, training, and support.
To enable community collaboration and accelerate connecting new resources, software, and service into CI, XSEDE recently introduced the Community Software Repository (CSR): a single interface to the tools and information used by XSEDE to plan, document, and organize its software-related services and features.
This paper describes the CSR vision and strategy, current capabilities, future plans, and related XSEDE efforts.


Thursday July 13, 2017 12:00pm - 12:30pm
Strand 12

12:00pm

OpenMP 4 Fortran Modernization of WSM6 for KNL
Parallel code portability in the petascale era requires modifying existing codes to support new architectures with large core counts and SIMD vector units. OpenMP is a well established and increasingly supported vehicle for portable parallelization. As architectures mature and compiler OpenMP implementations evolve, best practices for code modernization change as well. In this paper, we examine the impact of newer OpenMP features (in particular OMP SIMD) on the Intel Xeon Phi Knights Landing (KNL) architecture, applied in optimizing loops in the single moment 6-class microphysics module (WSM6) in the US Navy's NEPTUNE code.
We find that with functioning OMP SIMD constructs, low thread invocation overhead on KNL and reduced penalty for unaligned access compared to previous architectures, one can leverage OpenMP 4 to achieve reasonable scalability with relatively minor reorganization of a production physics code.


Thursday July 13, 2017 12:00pm - 12:30pm
Strand 11

12:30pm

Awards Luncheon
We close out PEARC17 with a plated luncheon and presentation of awards to best papers in each track, best student paper and poster, and best visualization showcase entry. Be sure to stick around—it could be you!

Thursday July 13, 2017 12:30pm - 2:00pm
Empire CD

2:00pm

XSEDE All-Staff Meeting
Speakers
avatar for Ron Payne

Ron Payne

XSEDE Program Manager, NCSA


Thursday July 13, 2017 2:00pm - 4:00pm
Strand 11

3:00pm

XSEDE ECSS Training: Part 2
Closed event. Training session for XSEDE ECSS staff.

Speakers

Thursday July 13, 2017 3:00pm - 5:00pm
Strand 11