The History of CUDA

albertzeyer · on June 7, 2023

Oh, after 3:20 minutes, it's already over, very short video.

The main message is about the motivation behind CUDA: People don't want to learn a completely new language, they want to invest as little as possible. So that means, have just C but on GPU. The motivation of CUDA was to make GPU programming as easy as possible for someone who knows already C programming.

majido · on June 7, 2023

Compatibility with the existing widely used technology is almost always a winning move.

This is the reason I am bullish on Mojo.

xdavidliu · on June 7, 2023

at 0:07 the audio says "I received my PhD from Stanford in 2004" while his lips say 2006. Strange! Anyone else notice that?

maxlin · on June 7, 2023

Just a glitch in the matrix don't worry about it

albert_e · on June 7, 2023

yep I noticed it rightaway - watched it twice to confirm -- and came here to post the same thing :)

not sure if it was a slip and corrected in post ... they are Nvidia afterall so they have all compute they want :).

But if it was indeed corrected in post then they did an excellent job of getting the acoustics / ambient noise perfectly right. Can someone do forensics on the audio track to see any editing artifacts?

PS: this bit is at timestamp 0:12 though

PS2: Channel is "Nvidia Tesla" and as someone commented on youtube he looks a bit like Elon Musk :)

usefulcanoll · on June 7, 2023

It could be edited in, maybe they said it later to correct themselves and it just fits the same space

dimitropoulos · on June 7, 2023

what happened is they took the recording of him saying "2004" at the 00:47 mark and spliced that audio over the 00:07 mark. strange, but if you listen to the 00:47 and then back to the 00:07 it's quite clear it's the same.

tumetab1 · on June 7, 2023

Felt the same but re-watched without audio and it seemed like 2004 again.

ly3xqhl8g9 · on June 7, 2023

Ian Buck's doctoral thesis Stream Computing on Graphics Hardware (2006) [1] and a shorter article about it (2004).

From the 2004 article: "It is also possible that future streaming hardware will share the same memory as the CPU, eliminating the need for data transfer altogether." Unified memory foreseen 16 years before Apple Silicon (and the point of this comparison is to indicate how hard is to go from a prediction in a paper to mass manufacturing/popularity, not that Apple invented unified memory).

[1] http://graphics.stanford.edu/~ianbuck/thesis.pdf

[2] https://www.cs.cmu.edu/afs/cs/academic/class/15869-f11/www/r...

sharpneli · on June 7, 2023

In 2020 Apple invented unified memory?

More seriously people need to stop with the Apple comparisons. Unified memory has been a thing for a way longer time. Heck around 2014 AMD had integrated GPUs with not just unified memory but fully unified address spaces with the host. Unified memory in itself happened way before that.

Not to mention that mobiles have always been unified archs. It’s just a design decision.

captainbland · on June 7, 2023

Even the Xbox 360 had unified memory. It's old as the hills at this point.

muziq · on June 7, 2023

The original Xbox was UMA..

ly3xqhl8g9 · on June 7, 2023

Ian Buck's 2004 prediction is still 10 years before 2014. I did not say Apple invented unified memory, it just got popular with Apple Silicon, and looking at local LLM inference on M1/M2 and the 192 GB of memory M2 Ultra allows, it will surely get more important.

kkielhofner · on June 7, 2023

More recent but the ARM based Nvidia Jetson line of hardware has had unified memory since 2017.

orbital-decay · on June 7, 2023

"Unified memory" has been around forever in one form or another, as it's simply a single address space (and physical location) for various independent subsystems. In graphics, it's probably been used since Amiga. (?) This is common in console GPUs which always punched above their weight. The ubiquitous Intel shared memory has been around for ages, although it was not entirely unified (reserved area for GPU, which it cannot escape; zero-copy still possible by allocating inside it and addressing data on CPU).

ly3xqhl8g9 · on June 7, 2023

I obviously did not mean "unified memory" in that sense. In that sense even Apple I in 1976 had "unified memory" [1] [2]. The sense in which I meant it, following the spirit of the above paper/thesis, which no one seems to have read because they were too quick to jump on the bashing bandwagon, was unified memory performing "stream computing", e.g. an Apple Silicon chip running a local large language model. And if you get to run Vicuna 13B or something else on an Intel Tiger Lake or similar, more power to you, and to us if you make it open source.

[1] "The display section contains its own memory, leaving all of RAM for user programs", http://s3data.computerhistory.org/brochures/apple.applei.197...

[2] "Running Apple 1 software on a breadboard computer (Wozmon)", https://www.youtube.com/watch?v=HlLCtjJzHVI

legosexmagic · on June 7, 2023

integrated gpus do perform "stream computing" tho

ChuckNorris89 · on June 7, 2023

>Unified memory foreseen 16 years before Apple Silicon.

Honestly, it's good to get some more background information before claiming that Apple invented every innovation till sliced bread.

Microcontrollers, SoCs from various vendors, gaming consoles and Intel CPUs with integrated graphics also had unified memory since .. forever(?), or at least nearly 30 years, because it was as efficient back then as it is now for silicone and SW usage.

Apple didn't reinvent the wheel in this regard, it was already there as a low hanging fruit.

ly3xqhl8g9 · on June 7, 2023

I did not say Apple invented unified memory, just that Ian Buck's prediction was 16 years before Apple Silicon. Also, in practice, is not only about unified memory, but also performance. The Intel chip with integrated graphics is able to boil an egg with the heat it dissipates, the M1/M2 are cool as if not even running while handling way more workload.

ChuckNorris89 · on June 7, 2023

Having unified memory and having much more compute performance/watt are two orthogonal issues. Unfired memory had already existed before Apple silicone and was already present in most consoles, cheap tablets and smartphones due to how ubiquos SoCs and Intel chips with iGPUs were.

ly3xqhl8g9 · on June 7, 2023

Ian Buck's thesis is about how GPUs can be used as "stream computing": "In this paper (Buck, 2004), we present Brook for GPUs, a system for general-purpose computation on programmable graphics hardware. Brook extends C to include simple data-parallel constructs, enabling the use of the GPU as a streaming co-processor." That's all the point: Apple Silicon allows for running local large language models (and other ML models/algorithms) in a way, at a price point, with enough performance, and so on in which other chips with unified memory don't.

HarHarVeryFunny · on June 7, 2023

There's a difference between CPU-GPU shared memory and unified memory, although not everyone seems to be using "unified" in the same sense.

What Apple appear to have with their M2 chips is shared memory meaning that the CPU and GPU are directly accessing the same memory chips. On the just-announced M2 Ultra chip they are claiming 800GB/sec memory bandwidth, which compares well to the 1TB/sec on a recent NVIDIA card.

Unified memory, at least as NVIDIA use the term, only refers to a unified address space such that the GPU and CPU (located on opposite sides of the PCI bus) can use the same address space to access memory. However, the memory being mapped to by this unified address space may be on either side of the PCI bus (i.e be CPU memory or GPU memory) and may migrate from one side to the other to optimize performance. Given how slow PCI bus transfers are compared to GPU memory bandwidth, the use cases for this is not at all the same as true shared memory... It's really just a developer convenience feature to not have to explicitly orchestrate CPU-GPU memory transfers yourself (which you may be better off doing to maximize performance).

NVIDIA seem to be going in the same direction as Apple here, with their latest designs integrating GPU and CPU on a single module.

gregw2 · on June 8, 2023

Unified memory was a key feature of Silicon Graphics's low end O2 workstation released in 1996 (well before that 2004 date.) It enabled both "unlimited" texture mapping memory and ability to map video streams as texture maps without extra memory moves or copies.

https://en.wikipedia.org//wiki/SGI_O2

When SGI's viability became questionable, I always thought there might be some value in Apple scooping them up for innovative bits of value like that but that never came to pass. Would be interested to know if it was ever considered/rejected and why.

dogma1138 · on June 7, 2023

Ian Buck is now one of the top execs at NVIDIA right now btw.

https://blogs.nvidia.com/blog/author/ian-buck/

hbbio · on June 7, 2023

Quite deserved: CUDA is probably the reason Nvidia became a trillion dollar co.

In his 2004 PhD, he tells ATI had a much better performance than nvidia... even if it was still the case, it would not even matter as their tools and drivers are terrible.

kurtoid · on June 7, 2023

doesn't intel integrates graphics share memory?

ly3xqhl8g9 · on June 7, 2023

Yes, but the point is "future streaming hardware" allowing for "stream computing" as Ian Buck puts it in 2004. The prediction is not: we will have unified memory and we will be able to boil eggs on a chip (such as Intel integrated graphics chip), but that we will have unified memory and be able to run Vicuna 13B locally (such as the M1/M2 chips).

cdelsolar · on June 7, 2023

I’ve been wanting to get into graphics card programming for a while and have found the documentation to be extremely difficult to understand. I was wondering if anyone knew of any good tutorials that can help me out here.

jadbox · on June 7, 2023

Will AMD be able to make cuda interface of their own?

HarHarVeryFunny · on June 7, 2023

They have their own alternatives to CUDA & cuDNN called ROCm & MiOpen, as well as tools (HIP, HIPify) that let you write code that'll run on both NVIDIA/CUDA and AMD cards.

There are also AMD versions of PyTorch and TensorFlow.

The problem is that all of these efforts are not quite 100% there ... there are bugs and incompatibilities that seem to make most people abandon them. It's a shame since the hardware itself seems great.

nicolaslem · on June 7, 2023

Another big issue is that AMD does not officially support their CUDA alternative on consumer hardware. Finding a list of supported GPUs is basically impossible so I am not surprised nobody bothers adding ROCm/HIP support to their software.

That is a stark contrast to Nvidia where everything works on even the most entry level GPU, encouraging adoption in third-party software.

mdaniel · on June 7, 2023

relevant: https://news.ycombinator.com/item?id=36065175 ("The tiny corp raised $5.1M") and https://news.ycombinator.com/item?id=36189705 ("Geohot/Tinycorp gives up on AMD")

dharma1 · on June 7, 2023

Feels like this (and the drivers, as recently rediscovered by geohot) should be their #1 priority to be frank. Or should have been for the last few years but better late than never.

The most bang for buck they could do to improve their competitiveness (and share price)

xdavidliu · on June 7, 2023

(2008)

ChuckNorris89 · on June 7, 2023

I thought first it was young Elon Musk from the '90s.

lynx23 · on June 7, 2023

(2015)

ChuckNorris89 · on June 7, 2023

Video is tagged Nov 19, 2008