Here we are going to talk about the upcoming game Grand Theft Auto 6 Graphics. GTA 6 is an open world game and have a huge virtual environment that players can explore freely, and even large game studios can take up to four to five years. Therefore, the time it takes to launch a new game tends to be quite large.
Grand Theft Auto 6 trailer
About Grand Theft Auto 6 Graphics
Here, deep learning algorithms can help reduce development time by taking over creative artists’ game visual design and rendering. In this article, we’ll look at two recent research papers that can help you design and visualize virtual worlds in GTA.
You can train neural networks to learn the shapes of various objects or assets that you want to include in the game’s virtual world. You can then feed the semantic label map describing the location of these objects to render a visual that looks realistic.
Researchers at MIT and NVIDIA have published a paper titled “Video-Video Synthesis” that can synthesize video from high-resolution, temporally consistent images. Special GAN architecture is used to ensure that the synthesized frames in the video look realistic and maintain visual consistency between the different frames.
GAN architecture used for vid2vid
The generator on the vid2vid network does not use the semantic map of the previous frame, as well as the current semantic map that you want to convert. It also uses the final composite frame of the previous output and combines it to calculate the flow map.
This gives you the information you need to understand the difference between two consecutive frames, allowing you to synthesize images that are consistent in time.
On the discriminator side, we use an image discriminator to control the output quality and add to it to check if the frame-by-frame sequence of the synthesized image makes sense according to the flow map. This rarely causes flicker between frames.
It also uses a gradual growth approach that starts with perfecting low resolution output first, and then uses that knowledge to incrementally climb to produce high resolution output. See the amazing results of this network in the picture below.
Problems with a GAN-based Approach
The visual quality of the vid2vid GAN network is impressive, but there are practical problems if you really want to use it in your games. It’s worth noting how GTA have day and night cycles that change the look of the virtual world. In addition, other weather effects, such as rain and fog, make this world look completely different.
In other words, the neural network that wants to render the graphics of the virtual world should be able to do this for various visual styles that correspond to lighting and weather effects. However, creating images of various shapes is problematic in GAN due to mode shrinkage.
Reduced mode from vid2vid to GAN
Imagine a lot of training images in the higher dimension coordinate space. In the figure above it is simply represented in two dimensions. Some of these points represent day time samples and some represent night images. Now, when unconditionally starting GAN training, we first generate a random image that will be pushed through the generator.
Now the training process essentially pushes these fake images towards the training image to make it look real. This causes some training images to be missing and unused. This allows the generator to produce only the same kind of image as the training is going on.
Therefore, GAN suffers from mode collapse, and images created in this way can not be visually diverse. This is how I found a research paper to solve this problem using Maximum Likelihood Estimation.
Different Image Synthesis in GTA 6
Berkeley’s researchers have published a paper titled “Different Image Synthesis of Semantic Layout with Conditional IMLE” to address the above-mentioned problem with a GAN-based training course on the vid2vid network. Rather than focusing on improving the quality of the output frame, we focus on compositing various images from the same semantic map.
In other words, unlike GAN, where one semantic label can only produce one output, you can use this method to render the same scene under all lighting or weather conditions.
This white paper shows how to use implicit probability estimation or IMLE to achieve this. We will try to understand why IMLE is better than GAN in this particular use case.
First select the training image, then bring the randomly generated image closer. This process is the reverse of how it works in the GAN. Then select another training image and drag another random image. This process is repeated until all the training images have been covered.
That is, in the course of the training, it is now possible to process both day and night time images, so that the generator is trained to produce images of various styles.
It is now an unconditional example of IMLE starting from a random noise image rather than a semantic label map, but the training process remains the same in both cases. When using semantic maps, only the input encoding changes, so let’s look at it.
In the conditional case of IMLE
The input here is a semantic label and not any image previously seen. Random input noise channels are added to the input encoding, which is used to control the visual style of the network output. Instead of using RGB semantic labels as input, classify the map into channels.
Each channel corresponds to one object type on the map. Now here is the most important part of this paper I personally found most interesting. Use additional noise input channels to control the appearance of the output style. So for one random noise image on this channel, the output follows a fixed output style like the daytime effect.
Changing this channel to another random noise image follows a different style, such as a night effect. And by interpolating these two random images you can actually control the time in the output image. This is really cool and attractive!
Use this AI to render GTA 5
I tried to reproduce this effect with a short clip in game GTA 5. I used the Image Segmentaiton network to get the game’s semantic label and then ran it over the IMLE trained network. The result is fascinating, considering that it’s the same generator network that GTA can generate for both day and night time of footage!
Grand Theft Auto 6 system requirements
- Operating system: Win 10 64
- Processor: Intel Core i5-4460 3.2GHz / AMD FX-8350
- Graphics: AMD Radeon R9 390 or NVIDIA GeForce GTX 970 4GB
- VRAM: 4 GB
- System Memory: 8GB RAM
- Storage Space: 100 GB Hard Drive Space
- DirectX 12 compatible graphics card.
- Operating system: Win 10 64
- Processor: Intel Core i7-8700K 6-Core 3.7GHz / AMD Ryzen R7 1700X
- Graphics: AMD Radeon RX Vega 64 Liquid 8GB or NVIDIA GeForce GTX 1080 Ti
- VRAM: 8 GB
- System Memory: 16GB RAM
- Storage Space: 55 GB Hard Drive Space.
Final words about Grand Theft Auto 6 Graphics
You can see how far we’ve come from AI-based graphics rendering between today’s two papers, vid2vid and IMLE-based image compositing. There are a few more hurdles to tackle before experimenting with this new AI-based graphics technology.
In about a decade from today, it’s expected that Grand Theft Auto will have AI-based asset rendering to help shorten game development time. The future of game development is exciting!