PEARC17 has ended
Once you’ve registered and arrive in New Orleans, be sure to use our mobile web app to manage your busy schedule so you don’t miss a thing. Also check the website for updates and use the #PEARC17 hashtag to keep up with friends and colleagues.  
Back To Schedule
Tuesday, July 11 • 2:00pm - 2:30pm
Insights into Research Computing Operations using Big Data-Powered Log Analysis

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Feedback form is now closed.
Research computing centers provide researchers with a wide variety of services including large-scale computing resources, data storage, high-speed interconnect and scientific software repositories to facilitate continuous competitive research.
Efficient management of these complex resources and services, as well as ensuring their fair use by a large number of researchers from different scientific domains, are key to a center's success.
Almost all research centers use monitoring services based on real time data gathered from systems and services, but they often lack tools to perform a deeper analysis on large volumes of historical logs for identifying insightful trends from recurring events. The size of collected data can be massive, posing significant challenges for the use of conventional tools for this kind of analysis.
This paper describes a big data pipeline based on Hadoop and Spark technologies, developed in close collaboration between TACC and Georgia Tech. This data pipeline is capable of processing large volumes of data collected from schedulers using PBSTools, making it possible to run a deep analysis in minutes, as opposed to hours as would be required by conventional tools.
Our component-based pipeline design adds the flexibility of plugging in different components, as well as promotes data reuse.
Using this data pipeline, we demonstrate the process of formulating several critical operational questions around researcher behavior, systems health, operational aspects and software usage trends, all of which are critical factors in determining solutions and strategies for efficient management of research computing centers.

Tuesday July 11, 2017 2:00pm - 2:30pm CDT
Strand 12