As expected, Nvidia unveiled three new GeForce RTX graphics cards at its Gamescom event. We have dealt with many rumors and speculations, but we now know what prices, functions and achievements are – and yes, even the name. Nvidia will give more details about the architecture in the coming days, but those details will be restricted to a later date, probably near 20 September, after which the RTX 2080 Ti and RTX 2080 will officially go on sale. But we have enough other information to analyze, so let's dive into it.
Nvidia's latest GeForce architecture was Pascal, who steered everything from the best graphics cards such as the GTX 1080 and GTX 1080 Ti to the entry level GTX 1050 and GT 1030. Last year Nvidia released a new Volta architecture, which apparently will still appear in the field of supercomputing and in-depth learning, because the new Turing architecture seems to beat it in almost every meaningful way. If you've just played a Titan V, that's bad news, but for gamers looking for new graphics cards, your patience has paid off.
Core specifications and prices for the RTX 20 series graphics cards
There was a lot of speculation and, yes, blatantly wrong estimates about what the Turing architecture would contain. Each & # 39; leakage & # 39; prior to last week was wrong. Chew it for a moment. We can make a well-informed estimate of what Nvidia and AMD can do with a future architecture, but such suspicions are unjustified. Nvidia revealed many core details of the Turing architecture at SIGGRAPH and with the official announcement of the GeForce RTX 20 series we can finally put all the rumors to bed.
Quick disclaimer: I used the & # 39; reference & # 39; specifications for all GPUs in the following table. The 20-series Founders Edition cards have a higher price, but are delivered with a 90 MHz higher boost clock for Turing, so they are in the same range as overclocked models from the factory are likely to land. As for the real reference cards, we do not know what they will look like and how widespread they will be, especially at the launch. I suspect that we will not see the lower part of the above price ranges for more than a month or two after the graphic cards have been shipped.
Here are the specifications (with some areas that are not yet known, such as the size and transistor counts for the smaller Turing core):
For traditional graphic work – what games have used so far – CUDA's core censuses have been moderately improved across the board. The 2080 Ti has 21 percent more cores than the GTX 1080 Ti, the RTX 2080 has 15 percent more cores than the GTX 1080 and the RTX 2070 has 20 percent more cores than the GTX 1070. The result in theoretical TFLOPS is a comparable 13.5 to 18.6 percent improvement – call it 15 percent on average. This is the important part: these theoretical numbers represent more of a worst scenario for Turing.
Nvidia has improved the CUDA cores in this round in an architectural way. An important change is that the CUDA cores can perform FP32 and INT calculations at the same time. Most graphic work depends on floating point calculations (eg 3.14159 * 2.71828), but whole number calculations for memory addresses are also important. It is unclear how this ultimately affects the graphics performance, but during his GeForce RTX presentation Nvidia CEO Jensen Huang stated that the Turing cores are "1.5 times faster" than the Pascal cores. If that figure is even close to reality, the new GPUs from the RTX 20 series will be considerably faster than the current 10 series.
The performance improvements do not stop at more and faster CUDA cores. Turing uses 14 GT / s GDDR6 memory in the three parts that have been unveiled so far. That gives the 2080 Ti a modest 27 percent improvement in bandwidth, the 2080 gets a bigger boost of 40 percent and the 2070 is catapulted to equivalence with the 2080 model and gets a 75 percent performance increase. Each GPU has a certain amount of memory bandwidth that is needed, after which faster memory does not help that much. Nvidia has traditionally kept the top GPUs reasonably well balanced, but the switch to GDDR6 has changed things. I suspect that the 2070 does not really need all that bandwidth, but having extra will certainly do no harm.
Everything so far represents updates to Nvidia's traditional GPU architecture. What follows are the new additions, the RT and Tensor cores. The RT stands for ray-tracing, a technique that was first introduced in 1979 by Turner Whitted. It is probably no coincidence that Whitted joined Nvidia in 2014, in his research department. The timing fits perfectly with Nvidia who makes serious efforts to implement real-time ray-tracing hardware, and Turing is the first clear result of those efforts – in a recent blog post, Whitted discussed part of its history with ray-tracing and global lighting.
I come back to some ray tracing in a bit, but the new information from Nvidia is that the RT cores do about 10 TFLOPS calculations for every Giga Ray per second. It is important to say that these TFLOPS are not general purpose TFLOPS, but instead are specific operations designed to speed up ray tracing calculations. Nvidia says that the RT nuclei are used to calculate ray triangle crossings (where a radius hits a polygon), as well as BVH traversal. That second bit requires a longer explanation.
BVH stands for "border volume hierarchy" and is a method for optimizing intersection calculations. Instead of checking rays against polygons, objects are encapsulated by larger, simple volumes. If a beam does not cut the large volume, no extra effort needs to be spent on checking the object. Conversely, when a ray crosses the boundary volume, the next level of the hierarchy is checked, with each level becoming more detailed. In fact, Nvidia provides hardware that speeds up general functions used in ray tracing, allowing the calculations to be accelerated in an order of magnitude (or more).
The last major architectural feature in Turing is the inclusion of Tensor cores. Normally this is used for machine learning. You may wonder why these are even useful for gaming. There is a future potential for games to use such cores to improve AI in games, but that seems unlikely – especially when over the next five years or more a large number of gamers do not have Tensor cores available. In the near future, these nuclei can be used in more practical ways.
Nvidia showed some examples of improved image enhancement, where machine learning that has been trained on millions of images can generate a better result with less blocking and other artifacts. Imagine that you're rendering a game with 1080p with a high framerate, but using the Tensor nuclei to scale that up to a pseudo-4k without the gigantic hit to performance we're currently making. It would not necessarily be perfect, but suddenly the thought of 4k screens on 144Hz with native & # 39; 4k content not so far-fetched.
Nvidia also discussed a new DLSS algorithm that offers a better anti-aliasing experience than TAA (temporal AA). It is not clear whether Infiltrator uses DLSS, the Tensor nuclei or whatever, but according to Nvidia, the Infiltrator demo runs with "78 fps" on a GTX 2080 Ti compared to just "30-something" fps on a GTX 1080 Ti- both at 4k.
Turing is manufactured with TSMC 12nm
One piece of news that was not at all surprising is that Turing GPUs will be manufactured using the 12nm FinFET process from TSMC. Later Turing models could possibly be manufactured by Samsung, as was the case with the GTX 1050/1050 Ti and GT 1030 Pascal parts, but the first round of Turing GPUs will come from TSMC.
What does the transition to 12nm from 16nm mean in practice? Several sources indicate that the 12nm of TSMC is more a refinement and adaptation to the existing 16nm than a real reduction of the feature sizes. In that sense, 12nm is more of a marketing term than a real die-shrink, but optimizations of process technology in the past two years should help improve clock speed, chip density and energy consumption – the sacred trinity of faster, smaller and cooler running chips. The 12nm FinFET process of TSMC is also fully mature at this point, with good yields, allowing Nvidia to create a very large GPU design.
The top TU102 Turing design has 18.6 billion transistors and measures 754 mm2. (Note that TU102 is what some places call it – Nvidia has not officially named the chips as far as I know. "A rose with a different name" and so ….) That's a huge chip, much bigger than the GP102 used in the GTX 1080 Ti (471 mm2 and 11.8 billion transistors). It is almost as big as the GV100 used in the Tesla V100 and Titan V (815 mm2), which is basically the same size as Nvidia can handle with the current production line of TSMC.
The TU102 supports up to 4,608 CUDA cores, 576 Tensor cores and 10 Giga rays / sec, distributed over 36 controlled multiprocessors (SM & # 39; s), with 128 CUDA cores and 16 Tensor cores per SM. As usual, Nvidia can partially disable chips to make lower-level models – or, more likely, collect chips that are partially defective. The RTX 2080 Ti uses 34 SM & # 39; s, providing 4,352 CUDA cores and 544 Tensor cores as far as we can see. Nvidia did not give specific details about the RT core counts, but RTX 2080 Ti is rated on the top 10 Giga Rays / s that Nvidia also uses for the Quadro RTX 6000, so it does not appear to have finished RT cores.
The second Turing chip is provisionally a step back in size, but Nvidia has not yet provided specific figures for the TU104. It has a maximum of 24 SMs and it will be used in the RTX 2080 and RTX 2070. The 2080 switches off only one SM, which contains 2,944 CUDA cores and 368 Tensor cores from what we can see. It also has a score of 8 Giga Rays / s, indicating that the RT cores may not be directly integrated into the SM & # 39; s. The RTX 2070 now switches off six SM & # 39; s for 2,304 CUDA cores and 288 Tensor cores and 6 Giga rays / s. That size is probably in the range of 500-550 mm2, with about 12-14 million transistors. More importantly, TU104 costs less to produce, so it can easily go into $ 500 parts.
By packing the Turing and GeForce RTX hardware, all new GPUs will use GDDR6 memory and based on the VRAM capacities, Nvidia uses 8Gb chips (while Quadro RTX uses 16 Gb chips). The TU102 has a 384-bit interface and the 2080 Ti switches one 32-bit channel to get a 352-bit interface, and the TU104 has a 256-bit interface. The use of 14 GT / s GDDR6 for both the 2070 and 2080 means that they end with the same memory bandwidth, which probably means that the 2070 has more bandwidth than normal. GDDR6 officially supports speeds of 14-16 GT / s and Micron has demonstrated 18 GT / s modules, so Nvidia now goes for the lower segment of the spectrum. In the future we could see memory faster, or on partner cards.
What is ray tracing and is it really that big of a deal?
That is it for architecture (for now at least), but I have promised to return to those RT cores and why they are important. Nvidia spends a lot of money on ray-tracing at Turing, which is often referred to as the Holy Grail & # 39; of computer images. This is because ray tracing can have a big influence on the way games are displayed. It is a big enough change that Nvidia has dumped the GTX branding on the new 20-series components (at least 2070 and higher) and is shifting to RTX. You could say that it is just marketing, but doing everything near real-time ray tracing is pretty incredible, and in 10 years we can look back on the introduction of RTX, just as we are currently looking back at the introduction of programmable shaders.
Explaining what ray-tracing is, how it works and why it is better than alternative rendering models is a huge topic. Nvidia and many others have published extensive statements – this is a good starting point if you want to know more, or watch this series of seven videos about RTX and games. Basically, ray tracing requires much more computational work than rasterizing, but the resulting images are generally much more accurate than the approaches that we are used to seeing. Ray Tracing is particularly effective in simulating lighting, including general lighting, spotlights, shadows, ambient occlusion and more. With RTX, Nvidia enables developers to come much closer to simulating accurate lighting and shadows.
Instead of explaining how ray tracing works, it is better to look at some examples of how it is used in games. There are currently 11 announced games in development that use the Nvidia RTX tracking (and probably others that have not yet been announced). There are 21 total games that use part of the new RTX enhancements that the Turing architecture of Nvidia offers, and here are some specific examples of games that use ray tracing.
This clip from Shadow of the Tomb Raider shows how RTX ray tracing can improve the lighting model. The most important elements to note are the spot lights (candles) in the foreground and the shadows that create them. Adding dynamic pointlights can drastically reduce performance with traditional screening, and the more point lights you have, the worse it gets. Developers and artists spend a lot of time nowadays on making approaches that look pretty good, but there are limits to what can be done. Ray Tracing offers a much more accurate representation of how light interacts with the environment.
Here is another clip that shows how ray-tracing improves the lighting in Shadow or the Tomb Raider, this time with two cone lights and two rectangular area lights. Everything looks good in traditional mode, with shadows changing on the basis of the lights, but the way in which those shadows come together does not really reflect the real world. The RTX lighting, on the other hand, uses physically-based modeling of the environment and shows the green and red spotlights that merge, blurring around the edges of shadows and more.
Another example of ray tracing with global lighting is Metro Exodus. Here the traditional model illuminates the whole room much more, while the & # 39; correct & # 39; ray-traced lighting has deep shadows in the corners, bright areas illuminated by direct lighting and indirect lighting so that some areas are still clearly visible while others are not. The opportunities this offers for artists and level designers are interesting, although I have to say that & # 39; realistic & # 39; shadows are not always more fun.
I got the chance to play the Metro Exodus demo, which enabled me to switch dynamically between RTX on / off. Walking around some dilapidated buildings, with RTX lighting the rooms are much darker. That can create a feeling of fear, but it also makes it more difficult to recognize objects and find out where you are going and what you can do. Anyway, the look and feel of the Metro world was excellent, and the RTX lighting makes for a completely different experience – this is not just a soft adjustment to images to offer slightly different shadows; RTX lighting clearly changes the environment and influences the gameplay.
There is, however, a second disadvantage: RTX has higher performance requirements. All games that are shown are in alpha or beta settings, so there can be so many changes, but it's clear that switching all the fancy RTX effects has a performance impact. I saw periodic stuttering in Shadow of the Tomb Raider, Metro Exodus and Battlefield V, the three biggest names at the moment for RTX. The visual difference can be impressive, but if performance is reduced by half compared to traditional rendering techniques, many gamers will likely have the effects turned off. There is work to be done and hopefully that work will be more in the form of software updates to improve performance without sacrificing quality, instead of having to wait a few generations of hardware before these things become practical.
Nvidia & # 39; s RTX is the shape of the future
If you have followed the graphics industry completely, it has always been clear that the intention was to get real-time ray tracing, or at least use some elements of ray tracing in a real-time graphics engine. Our graphics chips have come a long way over the last 30 years, including milestones such as the 3dfx Voodoo as the first mainstream consumer card that could create powerful 3D graphics, the GeForce 256 as the first GPU to accelerate the transformation and lighting process. and AMD's Radeon 9700 Pro as the first fully programmable DirectX 9 GPU. The Turing architecture of Nvidia seems to have an equal change compared to its predecessors as all these products.
Like any change, this will not necessarily be a beautiful and pure break with the old and the beginning of something new. As cool as real-time ray tracing could be, it requires new hardware. It is the proverbial chicken and egg problem, where the software will not support a new function without the hardware, but building hardware to speed up something that is not currently being used is a big investment. Nvidia has done this investment with RTX and Turing, and only the time will tell whether it pays.
Unfortunately, over the next five years we will have a messy situation where most gamers do not have a card that RTX can do – or even the generic DirectX RT from Microsoft. I'm going to talk to some developers who use RTX for ray tracing to find out how hard it is to add support to a game. Hopefully it is not too difficult, because most developers have to continue to support older products and screening technologies.
Even long-term RTX expansions may not win – it is Nvidia's proprietary technology, so AMD is now completely out of the question. Ideally, standards develop, as with Direct3D, and eventually games can support a single APU that performs ray tracing on any GPU / processor that is in a system. We're pretty good at DirectX 11/12 now, so maybe DirectX RT 5.0 will be that standard. But no matter how we get there, real-time ray tracing or a variant of that is the next big thing in PC gaming. Now we only have to wait until the consoles and software have overtaken the hardware.
But how does the hardware actually perform? Stay up-to-date for our full assessment of the GeForce RTX 2080 Ti and RTX 2080, on or around 20 September.