Hi all, I am interested in learning more about OpenCV and image recognition so I decided to work through a project originally created by Sentdex who runs pythonprogramming.net. As a disclaimer, this code was modified by yours truly, however the original code was sourced from stack-overflow, Sentdex, and google. I just thought this was a super cool way to check out what OpenCV has to offer. Eventually I plan on pulling data from this and running it through a convolutional neural network to hopefully create an AI that is able to control a vehicle and avoid obstacles in Grand Theft Auto V. For starters, though, we need to make sure we can actually interact with the game at all. This post will focus on obtaining visual output from the game that we will later process and use to send driving control inputs back to the game. So lets get started.
This project is not beginner friendly. If this is one of your first python projects, you should be able to follow along but you probably won’t understand whats going on. I recommend working through some of my beginner Python projects first, then come back to this one when you’re comfortable with the basics.
Also note that you will need to have a system that can handle running GTA V at at least high settings. If you can run the game on high, then you shouldn’t have a problem running two simultaneous 800×600 windows. Also make sure you are getting at least 30 FPS, the reason will become clear later.
import numpy as np
from PIL import ImageGrab
import cv2
import time
def screen_record():
last_time = time.time()
while(True):
# 800x600 windowed mode
printscreen = np.array(ImageGrab.grab(bbox=(0,40,800,640)))
print('loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
cv2.imshow('window',cv2.cvtColor(printscreen, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
If you want to change the size of the play-back screen or you want to run the game at a higher resolution, just modify the GTA settings, and change the
bbox=(0,40,800,640)
to what ever resolution you wantbbox=(0,40,x,y)
import numpy as np
import PIL
from PIL import ImageGrab
import cv2
import time
def process_img(origional_image):
processed_img = cv2.cvtColor(origional_image, cv2.COLOR_BGR2GRAY)
processed_img = cv2.Canny(processed_img, threshold1=200, threshold2=300)
return processed_img
last_time = time.time()
while(True):
screen = np.array(ImageGrab.grab(bbox=(0,40,800,640)))
new_screen = process_img(screen)
# printscreen_numpy = np.array(printscreen_pil.getdata(),dtype='uint8')
print('loop took {} seconds'.format(time.time()-last_time))
last_time = time.time()
cv2.imshow('window', new_screen)
cv2.imshow('window2',cv2.cvtColor(screen, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
With the images next to each other, you can see what the canny function picks out of the original image. I tried playing the game just looking at the processed image and it was tough…
I was curious as to how the canny edge detection algorithm actually works, so after further investigation I found that the image is first passed through a Gaussian Blur filter equation: which is used to smooth the image, reducing noise and granularity. It is then converted into a matrix, further reducing noise while isolating locations in the image containing color gradient changes, and then it is passed through this equation which categorizes each angle into one of four categories depending on its angle.
I don’t know why I just wrote that..