Python Programming Tutorials

Acquiring a Vehicle for the Agent - Python Plays GTA V

Hello and welcome to another object detection tutorial, in this tutorial we're going to use the TensorFlow Object Detection API to acquire a vehicle for our agent if we don't have one. In our previous tutorial, we sorted out which vehicle we want to approach, but we need the code to actually approach the car, in the function called determine_movement.

In order to head towards a vehicle, we need control both the keyboard and the mouse. To do this, we're going to bring in one more script from the GTA / self-driving car series/github: keys.py. This script, among other things, allows us to control the mouse and press keys. In some environments, you could just use something like PyAutoGUI, but, in our environment, we require direct inputs. Grab keys.py above, and make sure it's in your working directory before continuing.

Back in our main vehicle detection script, let's add two new imports:

import keys as k
import time

Now we're going to build our determine_movement function. This function's purpose is to "look" at a specific vehicle. Already in our code, we know the relative location to the nearest vehicle in our vision, but we need to use this to calculate where to have the code move the mouse so that we are looking at the car. Our function begins with:

def determine_movement(mid_x, mid_y,width=1280, height=705):
  
  x_move = 0.5-mid_x
  y_move = 0.5-mid_y

The mid_x and mid_y values are *relative.* They are not pure pixel values, they come back as percentages. This is how the neural network model is returning data back to us. So, in this case, we can calculate, still in percentage form, what movements will be required by subtracting the middle points of the object from the middle point of our screen (0.5, 0.5). If the x_move is greater than 0, we need to move left, if it's less than, we move right. For y_move, if it's greater than 0, we move up, less than 0 we move down.

Knowing this, we then need to determine by how much. Again, movement is still x_move or y_move, they are in percentage form, and we need to translate that to pixel numbers, which is why we're also passing width and height values into our function. I've set the defaults to be what I have set for the grabscreen. I display a 1280x720 window, which, along with the title bar, is about 1280x705 of actual game. Depending on your operating system and settings, your title bar may vary in size. When you visualize the window, if you can see the titlebar, adjust it. So, to get pixel moves, we just need to multiply the x_move or y_move by the width or height respectively.

To move the mouse, we use: keys.keys_worker.SendInput(keys.keys_worker.Mouse(0x0001, X_COORD, Y_COORD)). So then we can do the following to get our agent to look at the closest car:

  hm_x = x_move/0.5
  hm_y = y_move/0.5
  keys.keys_worker.SendInput(keys.keys_worker.Mouse(0x0001, -1*int(hm_x*width), -1*int(hm_y*height)))

That's it, so now our full function is:

def determine_movement(mid_x, mid_y,width=1280, height=705):
  x_move = 0.5-mid_x
  y_move = 0.5-mid_y
  hm_x = x_move/0.5
  hm_y = y_move/0.5
  keys.keys_worker.SendInput(keys.keys_worker.Mouse(0x0001, -1*int(hm_x*width), -1*int(hm_y*height)))

Okay, that'll do. Now, let's add in the rest of the code necessary. First, after our two if-statements, right before the while True loop, let's add:

    stolen = False

So it should look like:

with detection_graph.as_default():
  with tf.Session(graph=detection_graph, config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    stolen = False
    while True:

Next, we are going to check to see if we've already stolen a vehicle, by doing:

      if len(vehicle_dict) > 0:
        closest = sorted(vehicle_dict.keys())[0]
        vehicle_choice = vehicle_dict[closest]
        print('CHOICE:',vehicle_choice)
        if not stolen:
          determine_movement(mid_x = vehicle_choice[0], mid_y = vehicle_choice[1], width=1280, height=705)

Note that some of that code above is from the previous tutorial. Okay, now, let's add some log in for actually stealing a vehicle, if one is close enough to us:

          if closest < 0.1:
            keys.directKey("w", keys.key_release)
            keys.directKey("f")
            time.sleep(0.05)          
            keys.directKey("f", keys.key_release)
            stolen = True
          else:
            keys.directKey("w")

That'll probably work.

Full code up to this point:

# coding: utf-8
# # Object Detection Demo
# License: Apache License 2.0 (https://github.com/tensorflow/models/blob/master/LICENSE)
# source: https://github.com/tensorflow/models
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile
from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image
from grabscreen import grab_screen
import cv2
import keys as k
import time

keys = k.Keys({})


# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")


# ## Object detection imports
# Here are the imports from the object detection module.

from utils import label_map_util
from utils import visualization_utils as vis_util


# # Model preparation 
# What model to download.
MODEL_NAME = 'ssd_mobilenet_v1_coco_11_06_2017'
MODEL_FILE = MODEL_NAME + '.tar.gz'
DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_CKPT = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'mscoco_label_map.pbtxt')

NUM_CLASSES = 90

# ## Download Model
opener = urllib.request.URLopener()
opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())


# ## Load a (frozen) Tensorflow model into memory.
detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_CKPT, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine
label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# ## Helper code
def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


def determine_movement(mid_x, mid_y,width=1280, height=705):
  x_move = 0.5-mid_x
  y_move = 0.5-mid_y
  hm_x = x_move/0.5
  hm_y = y_move/0.5
  keys.keys_worker.SendInput(keys.keys_worker.Mouse(0x0001, -1*int(hm_x*width), -1*int(hm_y*height)))


# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)
gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.70)

with detection_graph.as_default():
  with tf.Session(graph=detection_graph, config=tf.ConfigProto(gpu_options=gpu_options)) as sess:
    stolen = False
    while True:
      #screen = cv2.resize(grab_screen(region=(0,40,1280,745)), (WIDTH,HEIGHT))
      screen = cv2.resize(grab_screen(region=(0,40,1280,745)), (800,450))
      image_np = cv2.cvtColor(screen, cv2.COLOR_BGR2RGB)
      # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
      image_np_expanded = np.expand_dims(image_np, axis=0)
      image_tensor = detection_graph.get_tensor_by_name('image_tensor:0')
      # Each box represents a part of the image where a particular object was detected.
      boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
      # Each score represent how level of confidence for each of the objects.
      # Score is shown on the result image, together with the class label.
      scores = detection_graph.get_tensor_by_name('detection_scores:0')
      classes = detection_graph.get_tensor_by_name('detection_classes:0')
      num_detections = detection_graph.get_tensor_by_name('num_detections:0')
      # Actual detection.
      (boxes, scores, classes, num_detections) = sess.run(
          [boxes, scores, classes, num_detections],
          feed_dict={image_tensor: image_np_expanded})
      # Visualization of the results of a detection.
      vis_util.visualize_boxes_and_labels_on_image_array(
          image_np,
          np.squeeze(boxes),
          np.squeeze(classes).astype(np.int32),
          np.squeeze(scores),
          category_index,
          use_normalized_coordinates=True,
          line_thickness=8)
      
      vehicle_dict = {}

      for i,b in enumerate(boxes[0]):
        #                 car                    bus                  truck
        if classes[0][i] == 3 or classes[0][i] == 6 or classes[0][i] == 8:
          if scores[0][i] >= 0.5:
            mid_x = (boxes[0][i][1]+boxes[0][i][3])/2
            mid_y = (boxes[0][i][0]+boxes[0][i][2])/2
            apx_distance = round(((1 - (boxes[0][i][3] - boxes[0][i][1]))**4),3)
            cv2.putText(image_np, '{}'.format(apx_distance), (int(mid_x*800),int(mid_y*450)), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (255,255,255), 2)

            '''
            if apx_distance <=0.5:
              if mid_x > 0.3 and mid_x < 0.7:
                cv2.putText(image_np, 'WARNING!!!', (50,50), cv2.FONT_HERSHEY_SIMPLEX, 1.0, (0,0,255), 3)
            '''

            vehicle_dict[apx_distance] = [mid_x, mid_y, scores[0][i]]

      if len(vehicle_dict) > 0:
        closest = sorted(vehicle_dict.keys())[0]
        vehicle_choice = vehicle_dict[closest]
        print('CHOICE:',vehicle_choice)
        if not stolen:
          determine_movement(mid_x = vehicle_choice[0], mid_y = vehicle_choice[1], width=1280, height=705)
          if closest < 0.1:
            keys.directKey("w", keys.key_release)
            keys.directKey("f")
            time.sleep(0.05)          
            keys.directKey("f", keys.key_release)
            stolen = True
          else:
            keys.directKey("w")

      cv2.imshow('window',image_np)
      if cv2.waitKey(25) & 0xFF == ord('q'):
          cv2.destroyAllWindows()
          break

With this, we have some decent-ish code to steal a car, but it's not very robust. If, for whatever reason, we fail to actually steal the car after pressing F, maybe the car was driving by too fast...etc... We're still thinking we stole a vehicle. It would be better to have some sort of visual check to know if we're driving a car or not to determine the "stolen" variable, but really we might as well rename the stolen variable to be in_car or something like that.

I also personally would prefer it if the mouse moved a bit more smoothly.

The next tutorial:

Reading game frames in Python with OpenCV - Python Plays GTA V
OpenCV basics - Python Plays GTA V
Direct Input to Game - Python Plays GTA V
Region of Interest for finding lanes - Python Plays GTA V
Hough Lines - Python Plays GTA V
Finding Lanes for our self driving car - Python Plays GTA V
Self Driving Car - Python Plays GTA V
Next steps for Deep Learning self driving car - Python Plays GTA V
Training data for self driving car neural network- Python Plays GTA V
Balancing neural network training data- Python Plays GTA V
Training Self-Driving Car neural network- Python Plays GTA V
Testing self-driving car neural network- Python Plays GTA V
A more interesting self-driving AI - Python Plays GTA V
Object detection with Tensorflow - Self Driving Cars in GTA
Determining other vehicle distances and collision warning - Self Driving Cars in GTA
Getting the Agent a Vehicle- Python Plays GTA V
Acquiring a Vehicle for the Agent - Python Plays GTA V