Shivam Mishra
Santa Clara, California, United States
4K followers
500+ connections
About
Senior Software Engineer | Platform & Infrastructure Specialist
Engineering platforms…
Activity
4K followers
Experience
Education
-
Visvesvaraya Technological University
-
-
Projects:
Hand Gesture Recognition using OpenCV-
The goal of this project was to design and build a man-machine interface using a video camera to interpret one-handed gestures (plus others for additional keyboard and mouse control).
The keyboard and mouse are currently the main interfaces between humans and computers.
Humans communicate mainly by vision and sound, therefore a man-machine interface would be more intuitive. No physical contact to the input devices…Projects:
Hand Gesture Recognition using OpenCV-
The goal of this project was to design and build a man-machine interface using a video camera to interpret one-handed gestures (plus others for additional keyboard and mouse control).
The keyboard and mouse are currently the main interfaces between humans and computers.
Humans communicate mainly by vision and sound, therefore a man-machine interface would be more intuitive. No physical contact to the input devices required !!!
IDE - Visual Studio | Languages/Libraries - C++ OpenCV
3D Maze Game-
3D-Maze is a simple maze constructed in 3D, using shaders based OpenGL. The maze is constructed using meshes and scaling factors with a background using cube mapping. Navigation through the maze is a first person camera, where the user can rotate the camera using the mouse and move using the keyboard functions.
IDE : Visual studio | Languages: C++, OpenGL, GLSL -
-
-
-
-
-
Licenses & Certifications
Patents
-
Application management platform for hyper converged cloud infrastructures
Filed US 2022 039525
Courses
-
C Programming
-
-
C#.Net
-
-
Computer Graphics and Visualization using OpenGL
-
-
Computer Networks
-
-
Data Structures
-
-
Database Management Systems (DBMS)
-
-
Design and Analysis of Algorithms
-
-
Graph Theory
-
-
Java
-
-
Logic Design
-
-
Microprocessors
-
-
Object Oriented Programming Using C++
-
-
Operating Systems
-
-
Web Programming
-
Projects
-
Cloud Infrastructure Efficiency Engineering (CIE) – IPP Core Infra, NVIDIA
Led design and development of a cloud-scale observability and diagnostics platform for NVIDIA’s centralized build and test infrastructure, covering more than 5,000 GPU nodes with plans to scale to tens of thousands.
Role and responsibilities
- Drove architecture, implementation, and rollout of a system agent, Java API services, and backend data pipelines used by SREs to maintain infrastructure stability across the company.
- Partnered with SRE teams to convert recurring…Led design and development of a cloud-scale observability and diagnostics platform for NVIDIA’s centralized build and test infrastructure, covering more than 5,000 GPU nodes with plans to scale to tens of thousands.
Role and responsibilities
- Drove architecture, implementation, and rollout of a system agent, Java API services, and backend data pipelines used by SREs to maintain infrastructure stability across the company.
- Partnered with SRE teams to convert recurring infrastructure issues into automated detection, faster root-cause analysis, and self-service diagnostics.
System capabilities
- Built a lightweight Golang agent that runs as a background service on each node, collecting rich CPU/GPU, hardware, and software metadata on configurable schedules and pushing it into Elasticsearch for fast indexed search.
- Developed a Java Spring Boot API layer that lets SREs query and compare nodes at scale, including “diff” operations between a single node and a pool, or across arbitrary node sets, to expose configuration and state differences.
- Exposed APIs for before/after snapshots around test runs so teams can diff system state over time and correlate changes with test failures.
Impact on reliability
- Enabled rapid identification of issues such as mismatched BIOS and driver versions, incompatible package stacks, abnormal GPU temperatures, and problematic CPU/GPU family combinations (for example, specific AMD CPU and Blackwell GPU pairings), as well as subtle version and configuration drifts across fleets.
- Reduced debug time from multi-day investigations to hours or less for many infrastructure-related test failures by providing a single, queryable source of truth for node state, history, and cross-node comparisons.
Tech stack
Technologies: Golang agent, Java Spring Boot APIs, PostgreSQL, Elasticsearch, internal data platforms, and modern observability tooling for logging and metrics. -
Centralized Notification Service – NGC / Internal Platforms (NVIDIA)
-
Built a centralized Notification Service that abstracts the complexity of event-based communication for internal products, providing a single, reusable platform for customized, multi-channel notifications. Microservices across the organization could onboard easily and start emitting notifications without reimplementing delivery logic or preference management.
The service enabled end users to subscribe to specific events and configure how they wanted to be notified (email, Slack, group…Built a centralized Notification Service that abstracts the complexity of event-based communication for internal products, providing a single, reusable platform for customized, multi-channel notifications. Microservices across the organization could onboard easily and start emitting notifications without reimplementing delivery logic or preference management.
The service enabled end users to subscribe to specific events and configure how they wanted to be notified (email, Slack, group chat, and other internal channels), with flexible templates to curate the content and reports sent for each notification type. Notification delivery was automated and triggered in real time as events occurred in upstream systems.
Contributed to the architecture, design, and end-to-end implementation, including CI/CD, deployment, and ongoing enhancements, while leading a team of two engineers to deliver the service into production for multiple products. This platform reduced duplication of effort across teams and ensured a consistent, reliable notification experience for internal users. -
CICD as a service
-
Served as the principal architect and lead developer for an enterprise-grade CI/CD-as-a-Service platform that automated software delivery pipelines across NVIDIA’s internal ecosystem. The platform enabled engineering teams to onboard applications instantly, with automated build, testing, versioning, quality certification, and multi-environment deployment—all governed through a unified workflow.
Led the end-to-end architecture, development, deployment, and production operations of the…Served as the principal architect and lead developer for an enterprise-grade CI/CD-as-a-Service platform that automated software delivery pipelines across NVIDIA’s internal ecosystem. The platform enabled engineering teams to onboard applications instantly, with automated build, testing, versioning, quality certification, and multi-environment deployment—all governed through a unified workflow.
Led the end-to-end architecture, development, deployment, and production operations of the system, ensuring high availability for mission-critical products. Played a key role in incident management and platform reliability, collaborating with cross-functional teams to maintain seamless operations and rapid recovery.
Mentored junior engineers, established best practices for scalable system design, and drove adoption of modern DevOps principles organization-wide.
The platform became a core productivity enabler, significantly accelerating software delivery cycles and ensuring consistent quality across teams.
Technologies: .NET Core, React, Redux, Node.js, PostgreSQL, Elasticsearch, Material UI, Docker, Kubernetes, Helm, Jenkins, Artifactory, OpenTelemetry -
Automated Lifecycle Manager (ALM) – NGC Cloud Group (NVIDIA)
-
I was a part of the architecture and development team of Automated Lifecycle Manager (ALM), a platform automating the complete lifecycle of NVIDIA applications deployed on on-premise Kubernetes clusters (e.g., Metropolis). ALM enabled secure and seamless release distribution, update scheduling, and version management across distributed client environments.
Designed and implemented the Kubernetes ALM Operator to detect new releases, notify connected clusters, and execute automated or…I was a part of the architecture and development team of Automated Lifecycle Manager (ALM), a platform automating the complete lifecycle of NVIDIA applications deployed on on-premise Kubernetes clusters (e.g., Metropolis). ALM enabled secure and seamless release distribution, update scheduling, and version management across distributed client environments.
Designed and implemented the Kubernetes ALM Operator to detect new releases, notify connected clusters, and execute automated or scheduled application updates. Built the integrated CI/CD pipeline handling certification, release, and deployment workflows with full traceability and compliance.
Technologies: Go, .NET Core, Kubernetes, Helm, Jenkins, React, PostgreSQL, OpenTelemetry -
Metrics [24 7.ai]
-
Designed and implemented a centralized analytics platform to surface engineering and business metrics from large-scale HDFS data to key stakeholders across the organization, including product management, executive leadership, marketing, revenue optimization, and operations teams.
Built role-based authentication and authorization to provide tailored dashboard views and metric access per stakeholder group, ensuring data governance, security, and relevance for each…Designed and implemented a centralized analytics platform to surface engineering and business metrics from large-scale HDFS data to key stakeholders across the organization, including product management, executive leadership, marketing, revenue optimization, and operations teams.
Built role-based authentication and authorization to provide tailored dashboard views and metric access per stakeholder group, ensuring data governance, security, and relevance for each audience.
Technologies: Node.js, React, Hadoop, Spark, Elasticsearch. -
End to End Test Automation Framework [24 7.ai]
-
Designed an automation framework used to validate the flow of data from the source to the big data platform.
The framework was integrated with TestNG. It took advantage of all the hooks provided by TestNG to perform operations before and after various events(test,suite,execution). It also had a retry analyzer which could be customized to re-run the tests that had failed.
The framework could capture the network requests that were made for each test and validate them. It was…Designed an automation framework used to validate the flow of data from the source to the big data platform.
The framework was integrated with TestNG. It took advantage of all the hooks provided by TestNG to perform operations before and after various events(test,suite,execution). It also had a retry analyzer which could be customized to re-run the tests that had failed.
The framework could capture the network requests that were made for each test and validate them. It was integrated with browser stack, which enabled the UI tests to be executed on OS and browsers passed at run time.
Technologies - Java, Selenium, BrowserStack, BrowserMob -
Automation Dashboard [Aurigo Software Technologies]
-
See project
Automation Dashboard is a web application that was built to track the automation progress, give visualization for different metrics like current automation status and coverage, progress made on automation each week, trigger automation for different products and have the status monitored from the dashboard and publish the test results on the dashboard.
Technologies Used :
ASP.Net MVC, MSSQL, Entity Framework, HTML, CSS, Bootstrap, Javascript -
Automation Framework - Optimus [Aurigo Software Technologies]
-
Optimus is an automation framework built to write automated tests for Masterworks. The framework defines a set of standards that must be followed to make the tests robust and maintainable.
Highlights of the framework are-
1. Modular in nature - Separate module for Executor (responsible for executing tests), Common Utilities (Reporting, DB Access, Emails, helper classes for xml and excel, etc) and Test Scripts.
In order to execute tests from another team, just plug in the tests…Optimus is an automation framework built to write automated tests for Masterworks. The framework defines a set of standards that must be followed to make the tests robust and maintainable.
Highlights of the framework are-
1. Modular in nature - Separate module for Executor (responsible for executing tests), Common Utilities (Reporting, DB Access, Emails, helper classes for xml and excel, etc) and Test Scripts.
In order to execute tests from another team, just plug in the tests dll and its available for execution.
2.Configurable - all the configurations are kept separate in xml files. Settings like build on which automation has to run, emails, reports, publishing report on the dashboards, what type of run (smokes, full pass, etc ), browser to be used for test, priorities of tests to be executed, etc can be configured just by modifying few values.
3.Reporting - There are multiple types of reports that are generated for each automation run - HTML, xml and text.
4.The framework supports UI Testing and API Testing.
5.The framework supports parallel Execution of tests module wise. Feature for executing individual tests in parallel in under progress.
We have a separate dashboard for tracking progress of automation for each team in the organization. -
Online Teachers Appraisal System (OTAS) [RNSIT]
-
OTAS is a web Application which is used for the performance evaluation of the professors in the college. The data was collected from the students at the end of each semester and reports were generated which were reviewed by the HOD of all the departments and the higher management.
It had a role based authentication mechanism in which the same application was catered to different users like students, teachers, Head of Departments and the Principal.
The pages that were rendered were…OTAS is a web Application which is used for the performance evaluation of the professors in the college. The data was collected from the students at the end of each semester and reports were generated which were reviewed by the HOD of all the departments and the higher management.
It had a role based authentication mechanism in which the same application was catered to different users like students, teachers, Head of Departments and the Principal.
The pages that were rendered were based on their roles.
The Website was hosted on a local server in the college
Technologies Used :
ASP.NET MVC, MSSql, Bootstrap, JQuery, HTML, CSS, Microsoft SSRS for reporting
and Entity Framework -
Project Open house Panorama (PROP) [RNSIT]
-
It is a state level inter collegiate Project Open house Panorama hosted by the Department of Computer Science and Engineering RNSIT. It facilitates learning and career enhancement research and exploring of ideas in all categories of Information Technology. It gives and opportunity for companies to identify individuals with exceptional capabilities to develop functional projects.
Technologies used for the website -
HTML, CSS, JQuery, MySQL, PHPOther creators
Honors & Awards
-
Young Innovator of the Year
Aurigo Software Technologies
-
Spot Award
Aurigo Software Technologies
Languages
-
English
Native or bilingual proficiency
-
Hindi
Native or bilingual proficiency
-
Marathi
Professional working proficiency
-
Maithili
Full professional proficiency
Recommendations received
5 people have recommended Shivam
Join now to viewOther similar profiles
Explore top content on LinkedIn
Find curated posts and insights for relevant topics all in one place.
View top contentOthers named Shivam Mishra
7194 others named Shivam Mishra are on LinkedIn
See others named Shivam Mishra