It’s VMware’s intention to acquire Bitfusion. Bitfusion delivers “the industry’s first AI Infrastructure disaggregation platform for GPUs and FPGAs”. Bitfusion’s main product is flexdirect and “makes GPUs a first class resource that can be abstracted, partitioned, automated and shared much like traditional compute resources”. The idea is to partition GPU accelerators into multiple virtual GPUs (of any size). These virtual GPUs can be accessed remotely through the network by virtual machines.
The trick is that Bitfusion runs a user space application in the OS (VM). This application can access a GPU accelerated server (VM or physical server) and consumes available GPU resources through the network. The GPU accelerated server is running a transparent software layer. GPU resources can be allocated dynamically, based on the actual demand. If GPU resources are not used anymore, they will be released back into the resource pool. Read more about Bitfusion at Bitfusion.io, there are also some good white papers available for download.
Meet Bitfusion at VMworld
If you want to learn more about Bitfusion, check out these sessions at VMworld US and Europe:
Elastic AI Infrastructure on vSphere: Virtual GPU and FPGA with Bitfusion (BCA2626BU – VMworld US)
With enterprises’ growing adoption of AI, ML, and analytics workloads, GPUs and FPGAs are becoming integral infrastructure parts to provide the timely results required by business-critical apps. For IT admins, this means satisfying a growing demand for elastic and virtual infrastructure to support more apps and users. Join this session with our technology partner, Bitfusion, to learn how its software turns GPUs and FPGAs into a shared network-attached virtual GPU and FPGA pool, responding in real time to workload demand. With Bitfusion’s software and VMware vSphere, see how to get the best of both worlds: a robust SDDC with ML/AI workloads in VMs or containers dynamically consuming any amount of GPU (partial, multiple, local, remote) while preserving those workloads’ ability to leverage vSphere vMotion and Distributed Resource Scheduler.
Machine Learning with GPU using Cloud Automation Services and Bitfusion (MLA2612BE – VMworld Europe)
Data science and machine learning (ML) are some of the fastest growing fields today. As organizations use data to drive better decisions, they also seek virtualized infrastructure that can handle it. ML benefits significantly from parallel processing capabilities offered by multicore modern GPU hardware accelerators. While the ability for VMs to leverage a host GPU has existed since VMware vSphere6, the complexity involved made it challenging to offer ML containers with GPU as a service. The combination of VMware Cloud Automation and Bitfusion (a third-party solution) provides a compelling platform to offer on-premises ML services with GPU support, deployed and managed from the VMware Cloud. This technical session will include architecture, implementation, and a demonstration of the solution.
Other sessions that also might be of interest if you’re working in the ML/AI space (U = US session, E = Europe session):
- VI Admin’s Guide: Supporting Container-Based Machine Learning with PKS (MLA2036BU/MLA2036BE).
- How GPU-Assisted ML for Medical Research proved to be a Force for Good (HBI1546BU/HBI1546BE).
- Support Machine Learning workloads and GPUs on vSphere (MLA3014WU/MLA3014TE).
- All ML/Analytics session at VMworld US and VMworld Europe.