John Linford

John Linford

Austin, Texas, United States
3K followers 500+ connections

Activity

Join now to see all activity

Experience

  • NVIDIA Graphic
  • -

    Austin, Texas, United States

  • -

    Austin, Texas Area

  • -

    Austin, Texas Area

  • -

    Baltimore, Maryland Area

  • -

  • -

  • -

  • -

  • -

  • -

Education

  • Virginia Tech Graphic

    Virginia Tech

    -

    -

    Activities and Societies: Virginia Tech Triathlon Team, German Language Club

    Outstanding Ph.D. Dissertation Award. Competed at the 2008 and 2009 USA Collegiate National Triathlon Championship

  • -

    -

Publications

  • Performance Analysis of OpenSHMEM Applications with TAU Commander

    Lecture Notes on Computer Science (LNCS): Special Issue on OpenSHMEM

  • Unstructured-Grid CFD Algorithms on Many-Core Architectures

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17)

  • Performance Engineering FUN3D at Scale with TAU Commander

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16)

  • TAU Commander: An Intuitive Interface for the TAU Performance System

    Announcement of USDOE Scientific and Technical Information

    OSTI ID 1252491

  • Computational and energy efficiency optimizations of the air quality prediction model COSMO-ART

    PASC 2015 conference, Zurich, Switzerland

    POSTER

    Other authors
  • Intuitive Performance Engineering at the Exascale with TAU and TAU Commander

    ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14)

  • Profiling Non-Numeric OpenSHMEM Applications with the TAU Performance System

    Lecture Notes in Computer Science: Special Issue on OpenSHMEM

  • Efficient Parallel Runtime Bounds Checking with the TAU Performance System

    Proceedings of the 2013 IEEE High Performance Extreme Computing Conference (HPEC'13)

    Memory errors, such as an invalid memory access, misaligned allocation, or write to deallocated memory, are among the most difficult problems to debug because popular debugging tools do not fully support state inspection when examining failures. This is particularly true for applications written in a combination of Python, C++, C, and Fortran. We present a tool that can help identify and debug memory errors in amulti-language program at the point of failure. Integrated in the TAU Performance…

    Memory errors, such as an invalid memory access, misaligned allocation, or write to deallocated memory, are among the most difficult problems to debug because popular debugging tools do not fully support state inspection when examining failures. This is particularly true for applications written in a combination of Python, C++, C, and Fortran. We present a tool that can help identify and debug memory errors in amulti-language program at the point of failure. Integrated in the TAU Performance System (R), this debugging tool allocates pages of protected memory immediately before and after dynamic memory allocations. Accessing these “guard pages” raises an error signal that causes TAU to capture performance data at the point of failure, store detailed information for each frame in the callstack, and generate a file that may be sent to the developers for analysis. The tool works on parallel programs, providing feedback about every process regardless of whether it experienced the fault, and is useful to both software developers and users experiencing memory error issues as the file output may be exchanged between the user and the development team without disclosing potentially sensitive application data. This paper describes the tool and demonstrates its application to the multi-language CREATE-AV applications Kestrel and Helios. Since those codes are export controlled, we present results from an analogous code written specifically for testing but with structure and content derived from Helios and Kestrel. The analogous performance and debugging data closely match the data obtained from the CREATE-AV codes.

    Other authors
  • Scalable heterogeneous parallelism for atmospheric modeling and simulation

    Journal of Supercomputing

    Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function…

    Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.

    Other authors
    • Adrian Sandu
    See publication
  • Automatic Generation of Multi-Core Chemical Kernels

    IEEE TPDS: Special Issue on High-Performance Computing with Accelerators

    This work presents KPPA (the Kinetics PreProcessor: Accelerated), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRF- Chem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on…

    This work presents KPPA (the Kinetics PreProcessor: Accelerated), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRF- Chem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on coarse and fine grids. We introduce the multi-core architecture parameterization KPPA uses to generate a chemical kernel for these platforms and describe a code generation system that produces highly-tuned platform-specific code. Compared to state-of-the-art serial implementations, speedups exceeding 25× are regularly observed, with a maximum observed speedup of 41.1× in single precision.

    Other authors
    • John Michalakes
    • Manish Vachharijani
    • Adrian Sandu
    See publication
Join now to see all publications

Projects

  • A High Performance Chemical Simulation Preprocessor and Source Code Generator, Phase I

    - Present

    Numerical simulations of chemical kinetics are a critical component of aerospace research, Earth systems research, and energy research. These simulations enable a better understanding of the evolution of chemical species over time in domains as diverse as climate and weather prediction, combustion simulation, and air quality prediction. The time-to-solution in these simulations can be improved by over 30X via computational accelerators like Graphical Processing Units (GPUs) or the Intel Xeon…

    Numerical simulations of chemical kinetics are a critical component of aerospace research, Earth systems research, and energy research. These simulations enable a better understanding of the evolution of chemical species over time in domains as diverse as climate and weather prediction, combustion simulation, and air quality prediction. The time-to-solution in these simulations can be improved by over 30X via computational accelerators like Graphical Processing Units (GPUs) or the Intel Xeon Phi coprocessor, but the state-of-the-art tools for chemical kinetics do not support accelerators. ParaTools will develop a code generation tool called "Kppa" into a production-grade product that translates a high-level description of a chemical reaction network into simulation code that supports computational accelerators to significantly reduce time-to-solution. The generated code will provide the same software interface as existing tools to ensure immediate compatibility with popular codes like GEOS-Chem, WRF-Chem, MCM, etc. Kppa will include an online user productivity environment called "Kppa Cloud" for the development, testing, and benchmarking of chemical simulation codes. Kppa Cloud will enable users to graphically formulate new chemical reaction networks, maintain a library of chemical mechanisms, develop new mechanisms collaboratively, generate simulation code, explore the computational and numerical characteristics of the generated code, and test the generated code for stability and correctness. Kppa will enable supercomputer-level performance on smaller computers with lower costs, lower barriers to entry, and enable the rapid creation of high-performance kinetics simulations. Kppa will build on open source technologies to be backward compatible with the state-of-the-art in modeling and simulation and employ a modular design enabling extensibility to the computer architectures of the future.

  • A High Performance Chemical Simulation Preprocessor and Source Code Generator (NASA SBIR)

    - Present

    Many NASA programs, such as the Global Modeling and Assimilation Office
    (GMAO), use atmospheric models to further the understanding of complex
    Earth systems. NASA's microscale, mesoscale, and tropospheric simulations
    of air quality, weather prediction, and climate change spend 60-90% of
    their computational time in simulations of chemical kinetics.
    KPPA provides a high-level framework for describing chemical
    mechanisms and translating these into numerical simulations of…

    Many NASA programs, such as the Global Modeling and Assimilation Office
    (GMAO), use atmospheric models to further the understanding of complex
    Earth systems. NASA's microscale, mesoscale, and tropospheric simulations
    of air quality, weather prediction, and climate change spend 60-90% of
    their computational time in simulations of chemical kinetics.
    KPPA provides a high-level framework for describing chemical
    mechanisms and translating these into numerical simulations of the
    atmosphere by generating highly-optimized code for traditional, multi-core,
    and accelerated computer architectures. KPPA retains compatibility
    with existing models while enabling larger, more descriptive, and
    higher-quality simulations of atmospheric chemical kinetics on smaller,
    cost-effective computers. By combining knowledge of sparsity in the model
    data with parameterizations of the target hardware, KPPA will generate code
    that will not only improve the performance of existing models executing on
    traditional architectures, but will also allow it to target future parallel
    optimization opportunities. NASA atmospheric modeling applications will
    benefit from significantly reduced time-to-solution on supercomputing
    resources without sacrificing solution accuracy. This will translate to
    overall improvements in high-performance numeric simulations of the
    atmosphere for use in climate, weather, and air quality applications.

    See project
  • Army SBIR Phase II: An Approach for Parallelizing Legacy CFD Applications

    - Present

    A complete understanding of a flexible helicopter rotor flight envelope
    requires computational resources beyond the reach of most design engineers.
    However, the advent of desktop computers with multi-core CPUs and GPUs makes such
    computations affordable if the software is designed to use multi-core hardware efficiently.
    Phase I demonstrated the feasibility of a re-engineering tool that facilitates software modernization.
    In Phase II, ParaTools and Sukra Helitek jointly offer…

    A complete understanding of a flexible helicopter rotor flight envelope
    requires computational resources beyond the reach of most design engineers.
    However, the advent of desktop computers with multi-core CPUs and GPUs makes such
    computations affordable if the software is designed to use multi-core hardware efficiently.
    Phase I demonstrated the feasibility of a re-engineering tool that facilitates software modernization.
    In Phase II, ParaTools and Sukra Helitek jointly offer to develop the software re-engineering tool prototyped in Phase I and to apply Phase I innovations to RotCFD, an Integrated Development Environment for rotors.
    RotCFD is the second generation of the Rot3dc rotorcraft simulation tool developed and commercialized by Sukra Helitek and licensed and used by the Army, NASA, Navy, and the rotorcraft industry. Rot3dc has been successfully used in major rotorcraft initiatives including the V-22 (Osprey), RAH-66 (Comanche) and Sikorsky's Cypher UAV.
    The tools and the technical expertise developed will be invaluable in modernizing and improving
    the performance of application codes in addition to enhancing the current capabilities of RotCFD.
    Such tools are important to maintaining the U.S. government's leadership
    position in the helicopter world, making it possible to produce and validate
    high quality designs at an affordable price.

    Other creators
    See project
  • DOE SBIR Phase II: TAU Commander: An Intuitive Interface for the TAU Performance System

    - Present

    Principal investigator responsible for all technical design, reporting and documentation on the project. Coordinate all interactions between project partners to ensure that work proceeds according to contract agreements.

    Other creators
    See project
  • PToolsWin

    - Present

    Create and deploy a complete development environment for porting and tuning parallel Linux applications on Microsoft Windows 64-bit with Microsoft MPI for use on Windows Azure or Windows-based clusters.

    Other creators
    • Sameer Shende
    See project
  • The TAU Performance System

    - Present

    Develop and debug, implement new features, and improve usability. Also train, tutor, and support TAU users at high performance computing installations worldwide.

    Other creators
    See project
  • Operational High Resolution Chemical Kinetics Simulation (NASA SBIR)

    -

    Numerical simulations of chemical kinetics are critical to addressing urgent issues in both the developed and developing world. Ongoing demand for higher resolution models with larger chemical mechanisms drives exponential growth in computational cost: many models spend over 90% of their runtime simulating chemical kinetics. Energy efficiency and renewable energy system research and development depend on simulations involving thousands of chemical species and reactions, but there are no…

    Numerical simulations of chemical kinetics are critical to addressing urgent issues in both the developed and developing world. Ongoing demand for higher resolution models with larger chemical mechanisms drives exponential growth in computational cost: many models spend over 90% of their runtime simulating chemical kinetics. Energy efficiency and renewable energy system research and development depend on simulations involving thousands of chemical species and reactions, but there are no general analysis tools that can handle mechanisms of this size. Simulations of more than a few hundred species or reactions are hand-tuned, ad-hoc solutions that will ultimately become obsolete. ParaTools will address this need by improving its “Kppa” general analysis tool for chemical kinetics to facilitate coupling with high resolution models and to support large chemical mechanisms. Phase I will explore the feasibility of methods for large mechanism support including flux analysis for sub-cell parallelization and mechanism reduction, dynamic mechanism selection based on environmental conditions, and iterative methods for large sparse systems. Phase I will also improve Kppa as a general analysis source code generator by implementing accelerated analysis methods that use many-core and multi-core devices and/or GPUs to reduce mechanism analysis, support for non-Arrhenius reaction rates, and an interface for coupling Kppa-generated code with high resolution models. Phase II will implement large mechanism support based on Phase I findings. Pre-coupled open source model packages containing Kppa-generated source coupled with a multi-physics or flow code will be provided in Phase I to facilitate commercialization through Phase II and beyond. The improved Kppa tool will reduce time-to-solution by combining the latest numerical and algorithmic developments with accelerated computing technology to enable supercomputer-level performance on smaller computers with lower costs.

    See project
  • Performance Analysis of GraphBLAS

    -

    Using High Performance analysis tools created by ParaTools, Inc to assess the usefulness of these tools on a test code written in C++, OpenMP and MPI: GraphBLAS.

    GraphBLAS is a linear algebra library for solving graph applications such as the common BFS (Breath First Search algorithm), most commonly used to find the shortest path between two nodes or vertices.

    Other creators
  • Parallel Explicit Solver

    -

    Development and benchmarking.

    Other creators
    • Adrian Sandu
    See project
  • ParaTools ThreadSpotter

    -

    ThreadSpotter automatically analyzes application performance, rates performance problems, suggests fixes, and provides insights and statistics to quickly assess and resolve inefficient cache use. ParaTools ThreadSpotter expands these capabilities to support multi-language distributed memory applications and integrates with the TAU Performance System® to analyze data movement across the memory hierarchy on each compute node.

    See project
  • TAU Enterprise: Intuitive Performance Problem Identification and Resolution

    -

    The Department of Energy and other federal agencies have made significant investments in high performance software engineering tools, yet these tools still lack advanced problem identification capabilities. At the moment, users must rely heavily on their own experience and intuition to interpret software performance data, identify the root cause of a software performance problem, and ultimately resolve the problem. This limits tool adoption by small companies and independent software vendors…

    The Department of Energy and other federal agencies have made significant investments in high performance software engineering tools, yet these tools still lack advanced problem identification capabilities. At the moment, users must rely heavily on their own experience and intuition to interpret software performance data, identify the root cause of a software performance problem, and ultimately resolve the problem. This limits tool adoption by small companies and independent software vendors, particularly in the manufacturing sector, who lack extensive in-house software engineering experience.
    This project is developing a complete “production grade” software performance engineering product that lowers the barriers to entry for novice users and enhances their ability to mine actionable information from software performance data. The new product presents a simple, intuitive, and systematic user interface that guides users through performance engineering workflows and uses advanced cloud-hosted performance analysis services to offer unprecedented data analysis and problem identification and resolution capabilities.

    See project

Honors & Awards

  • Northrop Grumman Cync Program for Cybersecurity

    Northrop Grumman

    The Northrop Grumman and bwtech@UMBC Cync Program is an elite scholarship program that sponsors startup and early-stage companies with promising cybersecurity product ideas. This program is designed for small US and international businesses in all phases of the technology development cycle. The overall intent of the program is to expedite the commercialization of innovative cybersecurity technologies and ideas.
    http://www.bwtechumbc.com/cyber-incubator/northrop-grumman-cync-program/

  • National Defense Science and Engineering Graduate Fellowship

    American Society for Engineering Education

    4% acceptance rate, see https://ndseg.asee.org/

  • Central European Summer Research Institute Graduate Fellowship

    Institute for International Education

    https://www.iie.org/Programs/CESRI

Languages

  • English

    Native or bilingual proficiency

  • German

    Limited working proficiency

  • Spanish

    Elementary proficiency

Recommendations received

More activity by John

View John’s full profile

  • See who you know in common
  • Get introduced
  • Contact John directly
Join to view full profile

Other similar profiles

Explore top content on LinkedIn

Find curated posts and insights for relevant topics all in one place.

View top content

Add new skills with these courses