Welcome to part 8 of the Python Plays: GTA V tutorial series. After the initial release, I got tons of great ideas from all of you, along with some very useful code submissions either in the comments or by a pull request on the Github page. Thank you to everyone for contributing.
First, before we move on, let's talk about some of the changes that we've made.
Quite a few of you suggested to monitor slope to determine the rate of the turn and speed we should be going. This is a fantastic idea. I am not sure I want to implement that just yet since it might conflict with future code, but this is certainly something to consider for the future.
Next, the most common suggestion has been about PID controls. I know nothing about this. I understand we want granular motion, I just have absolutely no idea how we'd go about actually implementing it. If you have some ideas, submit a pull request on the Github Project Page or share a gist/text dump...etc.
Another idea someone else had, which will be helpful for anyone who struggled with bad FPS before, or moving forward, was that you could use a game mod to slow down the in-game speed/time. This is possible with Alexander Blade's Native Trainer + Enhanced Native Trainer, which you can get from gta5-mods.com. See the video for more information on this if you like. This isn't meant to be a GTA V mods tutorial, but I do want to bring your attention to these trainers, mainly because they make it super easy to create the wanted settings for our agent. I personally have been using these mods to keep the time of day where I want, clear skies, and remove traffic. Later, I can use them to simulate harsher environments to test our agent's limits as well.
Okay great, now on to immediate changes we will be making! "Frannecklp" submitted a method for us to take quicker screenshots using the pywin32 library. With this, we double our FPS across the board, and it's a very useful change. I am separating this file out as 'grabscreen.py' containing:
# Done by Frannecklp
import cv2
import numpy as np
import win32gui, win32ui, win32con, win32api
def grab_screen(region=None):
hwin = win32gui.GetDesktopWindow()
if region:
left,top,x2,y2 = region
width = x2 - left + 1
height = y2 - top + 1
else:
width = win32api.GetSystemMetrics(win32con.SM_CXVIRTUALSCREEN)
height = win32api.GetSystemMetrics(win32con.SM_CYVIRTUALSCREEN)
left = win32api.GetSystemMetrics(win32con.SM_XVIRTUALSCREEN)
top = win32api.GetSystemMetrics(win32con.SM_YVIRTUALSCREEN)
hwindc = win32gui.GetWindowDC(hwin)
srcdc = win32ui.CreateDCFromHandle(hwindc)
memdc = srcdc.CreateCompatibleDC()
bmp = win32ui.CreateBitmap()
bmp.CreateCompatibleBitmap(srcdc, width, height)
memdc.SelectObject(bmp)
memdc.BitBlt((0, 0), (width, height), srcdc, (left, top), win32con.SRCCOPY)
signedIntsArray = bmp.GetBitmapBits(True)
img = np.fromstring(signedIntsArray, dtype='uint8')
img.shape = (height,width,4)
srcdc.DeleteDC()
memdc.DeleteDC()
win32gui.ReleaseDC(hwin, hwindc)
win32gui.DeleteObject(bmp.GetHandle())
return cv2.cvtColor(img, cv2.COLOR_BGRA2RGB)
I've also separated out directkeys.py as before, but also separated out the lane drawing function (with hopes someone would improve it), so "draw_lanes.py" is:
import numpy as np
import cv2
import time
import pyautogui
from directkeys import PressKey,ReleaseKey, W, A, S, D
from draw_lanes import draw_lanes
from grabscreen import grab_screen
def roi(img, vertices):
#blank mask:
mask = np.zeros_like(img)
#filling pixels inside the polygon defined by "vertices" with the fill color
cv2.fillPoly(mask, vertices, 255)
#returning the image only where mask pixels are nonzero
masked = cv2.bitwise_and(img, mask)
return masked
def process_img(image):
original_image = image
# convert to gray
processed_img = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
# edge detection
processed_img = cv2.Canny(processed_img, threshold1 = 200, threshold2=300)
processed_img = cv2.GaussianBlur(processed_img,(5,5),0)
vertices = np.array([[10,500],[10,300],[300,200],[500,200],[800,300],[800,500],
], np.int32)
processed_img = roi(processed_img, [vertices])
# more info: http://docs.opencv.org/3.0-beta/doc/py_tutorials/py_imgproc/py_houghlines/py_houghlines.html
# rho theta thresh min length, max gap:
lines = cv2.HoughLinesP(processed_img, 1, np.pi/180, 180, 20, 15)
m1 = 0
m2 = 0
try:
l1, l2, m1,m2 = draw_lanes(original_image,lines)
cv2.line(original_image, (l1[0], l1[1]), (l1[2], l1[3]), [0,255,0], 30)
cv2.line(original_image, (l2[0], l2[1]), (l2[2], l2[3]), [0,255,0], 30)
except Exception as e:
print(str(e))
pass
try:
for coords in lines:
coords = coords[0]
try:
cv2.line(processed_img, (coords[0], coords[1]), (coords[2], coords[3]), [255,0,0], 3)
except Exception as e:
print(str(e))
except Exception as e:
pass
return processed_img,original_image, m1, m2
def straight():
PressKey(W)
ReleaseKey(A)
ReleaseKey(D)
def left():
PressKey(A)
ReleaseKey(W)
ReleaseKey(D)
ReleaseKey(A)
def right():
PressKey(D)
ReleaseKey(A)
ReleaseKey(W)
ReleaseKey(D)
def slow_ya_roll():
ReleaseKey(W)
ReleaseKey(A)
ReleaseKey(D)
def main():
for i in list(range(4))[::-1]:
print(i+1)
time.sleep(1)
last_time = time.time()
while True:
screen = grab_screen(region=(0,40,800,640))
print('Frame took {} seconds'.format(time.time()-last_time))
last_time = time.time()
new_screen,original_image, m1, m2 = process_img(screen)
#cv2.imshow('window', new_screen)
cv2.imshow('window2',cv2.cvtColor(original_image, cv2.COLOR_BGR2RGB))
if m1 < 0 and m2 < 0:
right()
elif m1 > 0 and m2 > 0:
left()
else:
straight()
#cv2.imshow('window',cv2.cvtColor(screen, cv2.COLOR_BGR2RGB))
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
break
#main()
With that, let's talk about moving forward. The plan so far has been to first come up with some basic driving rules, and hopefully detect whether or not we were between two lanes. What we ended up with was actually a half-baked algorithm that could drive, which is more than I was expecting to get in such short order, but I'll take it. Now, the next scheduled step was to begin to train a neural network to play.
Why did we start with the basic rules? Well, a neural network takes a lot of data to train and be successful. I mean a LOT. Even then, it's likely going to need some failsafes to stop it from driving off a cliff.
Okay, so now what?
...well we need data. This is going to be a neural network, it's going to take some inputs, and produce an output. Our input will be the screen data. We could send in just the ROI we were using before, but I'm thinking of just sending in the entire screen.
What is the output?
The output is actual driving commands. As stated above, it would be ideal to have a PID system, but, for now, we've got the "all or nothing" setup. So, basically the input to the network is screen data and the output is going to be to press the W key or the A key, and so on. We'll break this down more when we get there, but first we need the most important thing in all of this: DATA!
We need the features and labels! To do this, we need to record ourselves playing GTA V, saving the screen data as what will be neural network input, the pixels as features, and then our keypresses, which will be the output target layer when training the network.
We'll be doing this in the next tutorial.