“John worked with me at the Computational Science Laboratory at Virginia Tech from 2006 to 2010. He was definitely a very smart student and a gifted researcher / scientist. I was always impressed by his ability to get things done quickly and to find answers to my questions. My labmates were also clearly impressed by his work.He is an outstanding scientist with excellent computer skills. ”
Activity
-
This past Wednesday (April 22), I had the privilege of hosting Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, for a special lecture…
This past Wednesday (April 22), I had the privilege of hosting Bryan Catanzaro, VP of Applied Deep Learning Research at NVIDIA, for a special lecture…
Liked by John Linford
-
Agentic AI, physical AI, and AI factories are converging to transform how we discover, design, and deploy intelligence into the physical world. At…
Agentic AI, physical AI, and AI factories are converging to transform how we discover, design, and deploy intelligence into the physical world. At…
Liked by John Linford
-
Today, we announced our collaboration with Northrop Grumman where we developed a foundational AI infrastructure powered by NVIDIA technology that…
Today, we announced our collaboration with Northrop Grumman where we developed a foundational AI infrastructure powered by NVIDIA technology that…
Liked by John Linford
Experience
Education
-
Virginia Tech
-
-
Activities and Societies: Virginia Tech Triathlon Team, German Language Club
Outstanding Ph.D. Dissertation Award. Competed at the 2008 and 2009 USA Collegiate National Triathlon Championship
-
-
-
Publications
-
Performance Analysis of OpenSHMEM Applications with TAU Commander
Lecture Notes on Computer Science (LNCS): Special Issue on OpenSHMEM
-
Unstructured-Grid CFD Algorithms on Many-Core Architectures
ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’17)
-
Performance Engineering FUN3D at Scale with TAU Commander
ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’16)
-
TAU Commander: An Intuitive Interface for the TAU Performance System
Announcement of USDOE Scientific and Technical Information
OSTI ID 1252491
-
Intuitive Performance Engineering at the Exascale with TAU and TAU Commander
ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis (SC’14)
-
Profiling Non-Numeric OpenSHMEM Applications with the TAU Performance System
Lecture Notes in Computer Science: Special Issue on OpenSHMEM
-
Efficient Parallel Runtime Bounds Checking with the TAU Performance System
Proceedings of the 2013 IEEE High Performance Extreme Computing Conference (HPEC'13)
Memory errors, such as an invalid memory access, misaligned allocation, or write to deallocated memory, are among the most difficult problems to debug because popular debugging tools do not fully support state inspection when examining failures. This is particularly true for applications written in a combination of Python, C++, C, and Fortran. We present a tool that can help identify and debug memory errors in amulti-language program at the point of failure. Integrated in the TAU Performance…
Memory errors, such as an invalid memory access, misaligned allocation, or write to deallocated memory, are among the most difficult problems to debug because popular debugging tools do not fully support state inspection when examining failures. This is particularly true for applications written in a combination of Python, C++, C, and Fortran. We present a tool that can help identify and debug memory errors in amulti-language program at the point of failure. Integrated in the TAU Performance System (R), this debugging tool allocates pages of protected memory immediately before and after dynamic memory allocations. Accessing these “guard pages” raises an error signal that causes TAU to capture performance data at the point of failure, store detailed information for each frame in the callstack, and generate a file that may be sent to the developers for analysis. The tool works on parallel programs, providing feedback about every process regardless of whether it experienced the fault, and is useful to both software developers and users experiencing memory error issues as the file output may be exchanged between the user and the development team without disclosing potentially sensitive application data. This paper describes the tool and demonstrates its application to the multi-language CREATE-AV applications Kestrel and Helios. Since those codes are export controlled, we present results from an analogous code written specifically for testing but with structure and content derived from Helios and Kestrel. The analogous performance and debugging data closely match the data obtained from the CREATE-AV codes.
Other authors -
Scalable heterogeneous parallelism for atmospheric modeling and simulation
Journal of Supercomputing
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function…
Heterogeneous multicore chipsets with many levels of parallelism are becoming increasingly common in high-performance computing systems. Effective use of parallelism in these new chipsets constitutes the challenge facing a new generation of large scale scientific computing applications. This study examines methods for improving the performance of two-dimensional and three-dimensional atmospheric constituent transport simulation on the Cell Broadband Engine Architecture (CBEA). A function offloading approach is used in a 2D transport module, and a vector stream processing approach is used in a 3D transport module. Two methods for transferring incontiguous data between main memory and accelerator local storage are compared. By leveraging the heterogeneous parallelism of the CBEA, the 3D transport module achieves performance comparable to two nodes of an IBM BlueGene/P, or eight Intel Xeon cores, on a single PowerXCell 8i chip. Module performance on two CBEA systems, an IBM BlueGene/P, and an eight-core shared-memory Intel Xeon workstation are given.
Other authors -
-
Automatic Generation of Multi-Core Chemical Kernels
IEEE TPDS: Special Issue on High-Performance Computing with Accelerators
This work presents KPPA (the Kinetics PreProcessor: Accelerated), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRF- Chem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on…
This work presents KPPA (the Kinetics PreProcessor: Accelerated), a general analysis and code generation tool that achieves significantly reduced time-to-solution for chemical kinetics kernels on three multi-core platforms: NVIDIA GPUs using CUDA, the Cell Broadband Engine, and Intel Quad-Core Xeon CPUs. A comparative performance analysis of chemical kernels from WRF- Chem and the Community Multiscale Air Quality Model (CMAQ) is presented for each platform in double and single precision on coarse and fine grids. We introduce the multi-core architecture parameterization KPPA uses to generate a chemical kernel for these platforms and describe a code generation system that produces highly-tuned platform-specific code. Compared to state-of-the-art serial implementations, speedups exceeding 25× are regularly observed, with a maximum observed speedup of 41.1× in single precision.
Other authors -
Projects
-
A High Performance Chemical Simulation Preprocessor and Source Code Generator, Phase I
- Present
Numerical simulations of chemical kinetics are a critical component of aerospace research, Earth systems research, and energy research. These simulations enable a better understanding of the evolution of chemical species over time in domains as diverse as climate and weather prediction, combustion simulation, and air quality prediction. The time-to-solution in these simulations can be improved by over 30X via computational accelerators like Graphical Processing Units (GPUs) or the Intel Xeon…
Numerical simulations of chemical kinetics are a critical component of aerospace research, Earth systems research, and energy research. These simulations enable a better understanding of the evolution of chemical species over time in domains as diverse as climate and weather prediction, combustion simulation, and air quality prediction. The time-to-solution in these simulations can be improved by over 30X via computational accelerators like Graphical Processing Units (GPUs) or the Intel Xeon Phi coprocessor, but the state-of-the-art tools for chemical kinetics do not support accelerators. ParaTools will develop a code generation tool called "Kppa" into a production-grade product that translates a high-level description of a chemical reaction network into simulation code that supports computational accelerators to significantly reduce time-to-solution. The generated code will provide the same software interface as existing tools to ensure immediate compatibility with popular codes like GEOS-Chem, WRF-Chem, MCM, etc. Kppa will include an online user productivity environment called "Kppa Cloud" for the development, testing, and benchmarking of chemical simulation codes. Kppa Cloud will enable users to graphically formulate new chemical reaction networks, maintain a library of chemical mechanisms, develop new mechanisms collaboratively, generate simulation code, explore the computational and numerical characteristics of the generated code, and test the generated code for stability and correctness. Kppa will enable supercomputer-level performance on smaller computers with lower costs, lower barriers to entry, and enable the rapid creation of high-performance kinetics simulations. Kppa will build on open source technologies to be backward compatible with the state-of-the-art in modeling and simulation and employ a modular design enabling extensibility to the computer architectures of the future.
-
A High Performance Chemical Simulation Preprocessor and Source Code Generator (NASA SBIR)
- Present
See projectMany NASA programs, such as the Global Modeling and Assimilation Office
(GMAO), use atmospheric models to further the understanding of complex
Earth systems. NASA's microscale, mesoscale, and tropospheric simulations
of air quality, weather prediction, and climate change spend 60-90% of
their computational time in simulations of chemical kinetics.
KPPA provides a high-level framework for describing chemical
mechanisms and translating these into numerical simulations of…Many NASA programs, such as the Global Modeling and Assimilation Office
(GMAO), use atmospheric models to further the understanding of complex
Earth systems. NASA's microscale, mesoscale, and tropospheric simulations
of air quality, weather prediction, and climate change spend 60-90% of
their computational time in simulations of chemical kinetics.
KPPA provides a high-level framework for describing chemical
mechanisms and translating these into numerical simulations of the
atmosphere by generating highly-optimized code for traditional, multi-core,
and accelerated computer architectures. KPPA retains compatibility
with existing models while enabling larger, more descriptive, and
higher-quality simulations of atmospheric chemical kinetics on smaller,
cost-effective computers. By combining knowledge of sparsity in the model
data with parameterizations of the target hardware, KPPA will generate code
that will not only improve the performance of existing models executing on
traditional architectures, but will also allow it to target future parallel
optimization opportunities. NASA atmospheric modeling applications will
benefit from significantly reduced time-to-solution on supercomputing
resources without sacrificing solution accuracy. This will translate to
overall improvements in high-performance numeric simulations of the
atmosphere for use in climate, weather, and air quality applications. -
Army SBIR Phase II: An Approach for Parallelizing Legacy CFD Applications
- Present
A complete understanding of a flexible helicopter rotor flight envelope
requires computational resources beyond the reach of most design engineers.
However, the advent of desktop computers with multi-core CPUs and GPUs makes such
computations affordable if the software is designed to use multi-core hardware efficiently.
Phase I demonstrated the feasibility of a re-engineering tool that facilitates software modernization.
In Phase II, ParaTools and Sukra Helitek jointly offer…A complete understanding of a flexible helicopter rotor flight envelope
requires computational resources beyond the reach of most design engineers.
However, the advent of desktop computers with multi-core CPUs and GPUs makes such
computations affordable if the software is designed to use multi-core hardware efficiently.
Phase I demonstrated the feasibility of a re-engineering tool that facilitates software modernization.
In Phase II, ParaTools and Sukra Helitek jointly offer to develop the software re-engineering tool prototyped in Phase I and to apply Phase I innovations to RotCFD, an Integrated Development Environment for rotors.
RotCFD is the second generation of the Rot3dc rotorcraft simulation tool developed and commercialized by Sukra Helitek and licensed and used by the Army, NASA, Navy, and the rotorcraft industry. Rot3dc has been successfully used in major rotorcraft initiatives including the V-22 (Osprey), RAH-66 (Comanche) and Sikorsky's Cypher UAV.
The tools and the technical expertise developed will be invaluable in modernizing and improving
the performance of application codes in addition to enhancing the current capabilities of RotCFD.
Such tools are important to maintaining the U.S. government's leadership
position in the helicopter world, making it possible to produce and validate
high quality designs at an affordable price.Other creatorsSee project -
DOE SBIR Phase II: TAU Commander: An Intuitive Interface for the TAU Performance System
- Present
Principal investigator responsible for all technical design, reporting and documentation on the project. Coordinate all interactions between project partners to ensure that work proceeds according to contract agreements.
Other creatorsSee project -
PToolsWin
- Present
Create and deploy a complete development environment for porting and tuning parallel Linux applications on Microsoft Windows 64-bit with Microsoft MPI for use on Windows Azure or Windows-based clusters.
Other creators -
-
The TAU Performance System
- Present
Develop and debug, implement new features, and improve usability. Also train, tutor, and support TAU users at high performance computing installations worldwide.
Other creatorsSee project -
Operational High Resolution Chemical Kinetics Simulation (NASA SBIR)
-
See projectNumerical simulations of chemical kinetics are critical to addressing urgent issues in both the developed and developing world. Ongoing demand for higher resolution models with larger chemical mechanisms drives exponential growth in computational cost: many models spend over 90% of their runtime simulating chemical kinetics. Energy efficiency and renewable energy system research and development depend on simulations involving thousands of chemical species and reactions, but there are no…
Numerical simulations of chemical kinetics are critical to addressing urgent issues in both the developed and developing world. Ongoing demand for higher resolution models with larger chemical mechanisms drives exponential growth in computational cost: many models spend over 90% of their runtime simulating chemical kinetics. Energy efficiency and renewable energy system research and development depend on simulations involving thousands of chemical species and reactions, but there are no general analysis tools that can handle mechanisms of this size. Simulations of more than a few hundred species or reactions are hand-tuned, ad-hoc solutions that will ultimately become obsolete. ParaTools will address this need by improving its “Kppa” general analysis tool for chemical kinetics to facilitate coupling with high resolution models and to support large chemical mechanisms. Phase I will explore the feasibility of methods for large mechanism support including flux analysis for sub-cell parallelization and mechanism reduction, dynamic mechanism selection based on environmental conditions, and iterative methods for large sparse systems. Phase I will also improve Kppa as a general analysis source code generator by implementing accelerated analysis methods that use many-core and multi-core devices and/or GPUs to reduce mechanism analysis, support for non-Arrhenius reaction rates, and an interface for coupling Kppa-generated code with high resolution models. Phase II will implement large mechanism support based on Phase I findings. Pre-coupled open source model packages containing Kppa-generated source coupled with a multi-physics or flow code will be provided in Phase I to facilitate commercialization through Phase II and beyond. The improved Kppa tool will reduce time-to-solution by combining the latest numerical and algorithmic developments with accelerated computing technology to enable supercomputer-level performance on smaller computers with lower costs.
-
Performance Analysis of GraphBLAS
-
Using High Performance analysis tools created by ParaTools, Inc to assess the usefulness of these tools on a test code written in C++, OpenMP and MPI: GraphBLAS.
GraphBLAS is a linear algebra library for solving graph applications such as the common BFS (Breath First Search algorithm), most commonly used to find the shortest path between two nodes or vertices.Other creators -
ParaTools ThreadSpotter
-
See projectThreadSpotter automatically analyzes application performance, rates performance problems, suggests fixes, and provides insights and statistics to quickly assess and resolve inefficient cache use. ParaTools ThreadSpotter expands these capabilities to support multi-language distributed memory applications and integrates with the TAU Performance System® to analyze data movement across the memory hierarchy on each compute node.
-
TAU Enterprise: Intuitive Performance Problem Identification and Resolution
-
See projectThe Department of Energy and other federal agencies have made significant investments in high performance software engineering tools, yet these tools still lack advanced problem identification capabilities. At the moment, users must rely heavily on their own experience and intuition to interpret software performance data, identify the root cause of a software performance problem, and ultimately resolve the problem. This limits tool adoption by small companies and independent software vendors…
The Department of Energy and other federal agencies have made significant investments in high performance software engineering tools, yet these tools still lack advanced problem identification capabilities. At the moment, users must rely heavily on their own experience and intuition to interpret software performance data, identify the root cause of a software performance problem, and ultimately resolve the problem. This limits tool adoption by small companies and independent software vendors, particularly in the manufacturing sector, who lack extensive in-house software engineering experience.
This project is developing a complete “production grade” software performance engineering product that lowers the barriers to entry for novice users and enhances their ability to mine actionable information from software performance data. The new product presents a simple, intuitive, and systematic user interface that guides users through performance engineering workflows and uses advanced cloud-hosted performance analysis services to offer unprecedented data analysis and problem identification and resolution capabilities.
Honors & Awards
-
Northrop Grumman Cync Program for Cybersecurity
Northrop Grumman
The Northrop Grumman and bwtech@UMBC Cync Program is an elite scholarship program that sponsors startup and early-stage companies with promising cybersecurity product ideas. This program is designed for small US and international businesses in all phases of the technology development cycle. The overall intent of the program is to expedite the commercialization of innovative cybersecurity technologies and ideas.
http://www.bwtechumbc.com/cyber-incubator/northrop-grumman-cync-program/ -
National Defense Science and Engineering Graduate Fellowship
American Society for Engineering Education
4% acceptance rate, see https://ndseg.asee.org/
-
Central European Summer Research Institute Graduate Fellowship
Institute for International Education
https://www.iie.org/Programs/CESRI
Languages
-
English
Native or bilingual proficiency
-
German
Limited working proficiency
-
Spanish
Elementary proficiency
Recommendations received
5 people have recommended John
Join now to viewMore activity by John
-
Calling all #HPC friends! The Cray User Group 2026 #CUG2026 meeting has already sparked some fantastic outcomes. Here’s one I’m especially excited…
Calling all #HPC friends! The Cray User Group 2026 #CUG2026 meeting has already sparked some fantastic outcomes. Here’s one I’m especially excited…
Liked by John Linford
-
Enjoyed presenting at Siemen's Simcenter Technology Conference last week. It was a privilege to share ideas on AI in engineering with fellow…
Enjoyed presenting at Siemen's Simcenter Technology Conference last week. It was a privilege to share ideas on AI in engineering with fellow…
Liked by John Linford
-
SIAM Computational Science and Engineering (CSE27) occurs February 22-26 (2027) in Pittsburgh. The call for participation is now live on the…
SIAM Computational Science and Engineering (CSE27) occurs February 22-26 (2027) in Pittsburgh. The call for participation is now live on the…
Liked by John Linford
-
We are proud to have helped NASA optimize this code and watch it power a pivotal mission like Artemis. We can't wait to see the new engineering this…
We are proud to have helped NASA optimize this code and watch it power a pivotal mission like Artemis. We can't wait to see the new engineering this…
Liked by John Linford
-
NASA has publicly released its Launch, Ascent, and Vehicle Aerodynamics (LAVA) software to the US aerospace industry. Originally developed at NASA's…
NASA has publicly released its Launch, Ascent, and Vehicle Aerodynamics (LAVA) software to the US aerospace industry. Originally developed at NASA's…
Liked by John Linford
-
𝐘𝐨𝐮 𝐜𝐚𝐧’𝐭 𝐥𝐚𝐮𝐧𝐜𝐡 𝐚𝐧 𝐀𝐈 𝐝𝐚𝐭𝐚 𝐜𝐞𝐧𝐭𝐞𝐫. 𝐘𝐨𝐮 𝐡𝐚𝐯𝐞 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐢𝐭 𝐢𝐧 𝐨𝐫𝐛𝐢𝐭. Today, Flexcompute is…
𝐘𝐨𝐮 𝐜𝐚𝐧’𝐭 𝐥𝐚𝐮𝐧𝐜𝐡 𝐚𝐧 𝐀𝐈 𝐝𝐚𝐭𝐚 𝐜𝐞𝐧𝐭𝐞𝐫. 𝐘𝐨𝐮 𝐡𝐚𝐯𝐞 𝐭𝐨 𝐛𝐮𝐢𝐥𝐝 𝐢𝐭 𝐢𝐧 𝐨𝐫𝐛𝐢𝐭. Today, Flexcompute is…
Liked by John Linford
-
Excited to share the release of the #Arm Neoverse CMN-700 Telemetry Solution. This is our first CMN telemetry solution offering, enabling deeper…
Excited to share the release of the #Arm Neoverse CMN-700 Telemetry Solution. This is our first CMN telemetry solution offering, enabling deeper…
Liked by John Linford
Other similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top content