Hi all, I am interested in learning more about OpenCV and image recognition so I decided to work through a project originally created by Sentdex who runs pythonprogramming.net. As a disclaimer, this code was modified by yours truly, however the original code was sourced from stack-overflow, Sentdex, and google. I just thought this was a super cool way to check out what OpenCV has to offer. Eventually I plan on pulling data from this and running it through a convolutional neural network to hopefully create an AI that is able to control a vehicle and avoid obstacles in Grand Theft Auto V. For starters, though, we need to make sure we can actually interact with the game at all. This post will focus on obtaining visual output from the game that we will later process and use to send driving control inputs back to the game. So lets get started.

This project is not beginner friendly. If this is one of your first python projects, you should be able to follow along but you probably won’t understand whats going on. I recommend working through some of my beginner Python projects first, then come back to this one when you’re comfortable with the basics.

Also note that you will need to have a system that can handle running GTA V at at least high settings. If you can run the game on high, then you shouldn’t have a problem running two simultaneous 800×600 windows. Also make sure you are getting at least 30 FPS, the reason will become clear later.

  • Alright, so first of all you will obviously need to get Grand Theft Auto 5. This method should work on really any game, I’m just using GTA 5. So go ahead and buy/download the game. I think its like $20 on Steam if you don’t already have it. Then come back when its done downloading. (It can take a couple hours depending on your internet connection)
  • Sweet, once you have the game downloaded, go ahead and run it. I think you have to play through the first like 10 minutes or so before you can access the open world. Once you do that, you will need to go into the settings and scale the window down to 800×600. This is just so we can have multiple windows open on the same screen without your computer crashing. So once you have that all setup, (and unless you have multiple monitors) I recommend organizing your screen like this:
  • Next you’re going to need to create a file directory, call it whatever you want, mines called ‘GTAV’, and create a new python file in it titled something like ‘drive.py’
  • Cool, now thanks to ‘Renan V. Novas’ over on StackOverflow we can use the code he posted there with a few modifications.
  • import numpy as np
    from PIL import ImageGrab
    import cv2
    import time
    
    def screen_record(): 
        last_time = time.time()
        while(True):
            # 800x600 windowed mode
            printscreen =  np.array(ImageGrab.grab(bbox=(0,40,800,640)))
            print('loop took {} seconds'.format(time.time()-last_time))
            last_time = time.time()
            cv2.imshow('window',cv2.cvtColor(printscreen, cv2.COLOR_BGR2RGB))
            if cv2.waitKey(25) & 0xFF == ord('q'):
                cv2.destroyAllWindows()
                break

    If you want to change the size of the play-back screen or you want to run the game at a higher resolution, just modify the GTA settings, and change the bbox=(0,40,800,640) to what ever resolution you want bbox=(0,40,x,y)

  • When we run this, it should look something like this:
  • And you should notice the terminal in the bottom left displaying our FPS. It runs at about 10 frames per second, which is actually pretty good, considering the whole 800×600 pixel frame is passed through an array, and each pixel is converted from GBR to RGB ten times every second. Also OpenCV runs mostly on the CPU, so its pretty impressive that the cpu is able to keep up with that load. (This is why I mentioned earlier you will want to make sure your system can handle running the game at at least 30-ish FPS.
  • Next we are going to pass this screen capture through a Canny-Edge detector so we can pick up the strongest lines in the image. So go ahead and update your code with the following:
  • import numpy as np
    import PIL
    from PIL import ImageGrab
    import cv2
    import time
    
    
    def process_img(origional_image):
        processed_img = cv2.cvtColor(origional_image, cv2.COLOR_BGR2GRAY)
        processed_img = cv2.Canny(processed_img, threshold1=200, threshold2=300)
        return processed_img
    
    last_time = time.time()
    while(True):
        screen =  np.array(ImageGrab.grab(bbox=(0,40,800,640)))
        new_screen = process_img(screen)
        # printscreen_numpy =   np.array(printscreen_pil.getdata(),dtype='uint8')
        print('loop took {} seconds'.format(time.time()-last_time))
        last_time = time.time()
        cv2.imshow('window', new_screen)
        cv2.imshow('window2',cv2.cvtColor(screen, cv2.COLOR_BGR2RGB))
        if cv2.waitKey(25) & 0xFF == ord('q'):
            cv2.destroyAllWindows()
            break
  • You can see we created a function called process_img that takes in our screen capture and uses OpenCV.Canny to process the image and return the Canny representation of our screen. When you run the code it should look something like this:
  • With the images next to each other, you can see what the canny function picks out of the original image. I tried playing the game just looking at the processed image and it was tough…

  • Close up:
  • I was curious as to how the canny edge detection algorithm actually works, so after further investigation I found that the image is first passed through a Gaussian Blur filter equation: which is used to smooth the image, reducing noise and granularity. It is then converted into a matrix, further reducing noise while isolating locations in the image containing color gradient changes, and then it is passed through this equation which categorizes each angle into one of four categories depending on its angle.

    I don’t know why I just wrote that..

  • But anyways, I’m going to end this tutorial here for now. Be on the lookout for part 2 soon! We will actually learn to control the car in GTA using python. Yea, its gonna be sick.