Preliminary VIME Improvement Results on Swimmer Gather Domain

Preamble

In this post, we ran 2 experiments with 10 random seeds across 2000 iterations in the Swimmer Gather domain. We used the code from Rein Houthooft's VIME-TRPO algorithm for the csv files associated with the vime_1, ..., vime_10 dataframes. We then modified the "intrinsic reward" function using an analogous function from financial risk theory called Entropic Value at Risk (EVaR) to generate the csv files associated with the mime_1, ..., mime_10 dataframes.

Note that 5 graphs are presented. The first graph shows each of the 10 randomly seeded executions of the VIME-TRPO algorithm. Note that almost half of the trajectories fail to get above 0.3 which represents 30% of the available reward. The second graph shows the average of all 10 randomly seeded executions of the VIME-TRPO algorithm. The performance is shown to be between 0.3 and 0.4.

The third graph shows each of the 10 randomly seeded executions of our modified VIME-TRPO algorithm. Note that none of the trajectories are below 0.3 in performance after 500 iterations. The fourth graph then shows that the average performance of all 10 randomly seeded executions of the modified VIME-TRPO algorithm is between 0.6 and 0.7.

The fifth graph shows the average results of the VIME-TRPO algorithm in blue and the average results of our modified VIME-TRPO algorithm in orange. This constitutes a near doubling of average peformance, but it must be emphasized that this is mainly due to improved robustness of the learning algorithm due to the use of EVaR.

Import Experiment Data

import numpy as np
import matplotlib
import pandas as pd

mime_1 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0001/progress.csv')
mime_2 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0002/progress.csv')
mime_3 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0003/progress.csv')
mime_4 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0004/progress.csv')
mime_5 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0005/progress.csv')
mime_6 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0006/progress.csv')
mime_7 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0007/progress.csv')
mime_8 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0008/progress.csv')
mime_9 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0009/progress.csv')
mime_10 = pd.read_csv ('trpo-expl_2018_12_15_08_50_56_0010/progress.csv')

vime_1 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0001/progress.csv')
vime_2 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0002/progress.csv')
vime_3 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0003/progress.csv')
vime_4 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0004/progress.csv')
vime_5 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0005/progress.csv')
vime_6 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0006/progress.csv')
vime_7 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0007/progress.csv')
vime_8 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0008/progress.csv')
vime_9 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0009/progress.csv')
vime_10 = pd.read_csv ('./trpo-expl_2019_01_22_12_56_49_0010/progress.csv')

List of Fields

Which outputs the variable names:
print(0,vime_1.columns[0])
print(1,vime_1.columns[1])
print(2,vime_1.columns[2])
print(3,vime_1.columns[3])
print(4,vime_1.columns[4])
print(5,vime_1.columns[5])
print(6,vime_1.columns[6])
print(7,vime_1.columns[7])
print(8,vime_1.columns[8])
print(9,vime_1.columns[9])
print(10,vime_1.columns[10])
print(11,vime_1.columns[11])
print(12,vime_1.columns[12])
print(13,vime_1.columns[13])
print(14,vime_1.columns[14])
print(15,vime_1.columns[15])
print(16,vime_1.columns[16])
print(17,vime_1.columns[17])
print(18,vime_1.columns[18])
print(19,vime_1.columns[19])

Will use StdReturn (normalized reward) for performance comparison.
0 MaxReturn
1 LossAfter
2 BNN_DynModelSqLossAfter
3 BNN_DynModelSqLossBefore
4 AverageReturn
5 Expl_MaxKL
6 Iteration
7 AverageDiscountedReturn
8 MinReturn
9 Expl_MinKL
10 dLoss
11 Entropy
12 AveragePolicyStd
13 StdReturn
14 Perplexity
15 MeanKL
16 ExplainedVariance
17 Expl_MeanKL
18 NumTrajs
19 Expl_StdKL

VIME Plots

import matplotlib.pyplot as plt

vime_1[vime_1.columns[13]].plot('line')
plt.show
vime_2[vime_2.columns[13]].plot('line')
plt.show
vime_3[vime_3.columns[13]].plot('line')
plt.show
vime_4[vime_4.columns[13]].plot('line')
plt.show
vime_5[vime_5.columns[13]].plot('line')
plt.show
vime_6[vime_6.columns[13]].plot('line')
plt.show
vime_7[vime_7.columns[13]].plot('line')
plt.show
vime_8[vime_8.columns[13]].plot('line')
plt.show
vime_9[vime_9.columns[13]].plot('line')
plt.show
vime_10[vime_10.columns[13]].plot('line')
plt.show
<function matplotlib.pyplot.show(*args, **kw)>
png

VIME Average Plot

vime_11 = (vime_1[vime_1.columns[13]]+vime_2[vime_2.columns[13]]+vime_3[vime_3.columns[13]]+vime_4[vime_4.columns[13]]+vime_5[vime_5.columns[13]]+vime_6[vime_6.columns[13]]+vime_7[vime_7.columns[13]]+vime_8[vime_8.columns[13]]+vime_9[vime_9.columns[13]]+vime_10[vime_10.columns[13]])/10
vime_11.plot('line')
plt.show
<function matplotlib.pyplot.show(*args, **kw)>
png

VIME Star Plots

import matplotlib.pyplot as plt

mime_1[mime_1.columns[13]].plot('line')
plt.show
mime_2[mime_2.columns[13]].plot('line')
plt.show
mime_3[mime_3.columns[13]].plot('line')
plt.show
mime_4[mime_4.columns[13]].plot('line')
plt.show
mime_5[mime_5.columns[13]].plot('line')
plt.show
mime_6[mime_6.columns[13]].plot('line')
plt.show
mime_7[mime_7.columns[13]].plot('line')
plt.show
mime_8[mime_8.columns[13]].plot('line')
plt.show
mime_9[mime_9.columns[13]].plot('line')
plt.show
mime_10[mime_10.columns[13]].plot('line')
plt.show
<function matplotlib.pyplot.show(*args, **kw)>
png

VIME Star Average Plot

mime_11 = (mime_1[mime_1.columns[13]]+mime_2[mime_2.columns[13]]+mime_3[mime_3.columns[13]]+mime_4[mime_4.columns[13]]+mime_5[mime_5.columns[13]]+mime_6[mime_6.columns[13]]+mime_7[mime_7.columns[13]]+mime_8[mime_8.columns[13]]+mime_9[mime_9.columns[13]]+mime_10[mime_10.columns[13]])/10
mime_11.plot('line')
plt.show
<function matplotlib.pyplot.show(*args, **kw)>
png

VIME and VIME Star Average Plots on Same Figure

vime_11 = (vime_1[vime_1.columns[13]]+vime_2[vime_2.columns[13]]+vime_3[vime_3.columns[13]]+vime_4[vime_4.columns[13]]+vime_5[vime_5.columns[13]]+vime_6[vime_6.columns[13]]+vime_7[vime_7.columns[13]]+vime_8[vime_8.columns[13]]+vime_9[vime_9.columns[13]]+vime_10[vime_10.columns[13]])/10
vime_11.plot('line')
plt.show
mime_11 = (mime_1[mime_1.columns[13]]+mime_2[mime_2.columns[13]]+mime_3[mime_3.columns[13]]+mime_4[mime_4.columns[13]]+mime_5[mime_5.columns[13]]+mime_6[mime_6.columns[13]]+mime_7[mime_7.columns[13]]+mime_8[mime_8.columns[13]]+mime_9[mime_9.columns[13]]+mime_10[mime_10.columns[13]])/10
mime_11.plot('line')
plt.show
<function matplotlib.pyplot.show(*args, **kw)>
png

Urbana Housing Inspection Data

Scope.md

Scope

We will be comparing the distribution of Urbana housing inspection grades as compared to that of Urbana residents who are receiving General or Homeless Assistance support.

Information Already Visualized and Data Availability

As this link shows, we already have a distribution of housing inspection grades for the City of Urbana. The default view is for building inspections between March 20 of 2007 and February 7 of 2019. Over that time range:

  • 153 inspections (8% of inspections) resulted in A’s
  • 1,518 inspections (83% of inspectiosn) resulted in B’s
  • 143 inspections (8% of inspections) resulted in C’s
  • 9 inspections (0% of inspections) resulted in D’s
  • 3 inspections (0% of inspections) resulted in F’s

API Service Used

We have access to data using the SODA API developed by Socrata, which appears to be part of the Open Data Network. Documentation of the API is here. I had pre-emptively gotten an app token for us to use in the even that we want up to 1000 API requests per “rolling hour period.” Our token is “Y6VDhjTt1iHtj2RQmX82shXZ7” and ideally this would be used to automatically aggregate public data.

List of Parameters

The parameters are separated by commas and are organized by the type of information provided.

  • Building Inspection Time
  • “expiration date”, “inspection_date”, “inspection_year”
  • Grade
  • “grade”
  • License Status
  • “license_status”
  • Location
  • “mappable_address”, “mappable_address_address”, “mappable_address_city”, “mappable_address_state”, “parcel_number”, “property_adress”
  • Other
  • “:@computed_region_29jt_857g”, “:@computed_region_3h3r_sq6z”
import requests
import json

urbanaBuildingGrades = requests.get("http://data.urbanaillinois.us/resource/2tkj-9e9d.json?$limit=50000")
urbanaBuildingGrades_data = urbanaBuildingGrades.json()
print(urbanaBuildingGrades_data)
print(urbanaBuildingGrades_data[0:9])
print(urbanaBuildingGrades_data[0]["grade"])
print(type(urbanaBuildingGrades_data[0]["grade"]))

Checkpoint

The below code extracts all building inspections and counts the number of A, B, C, D, and F grades.

import requests
import json
import numpy as np

urbanaBuildingGrades = requests.get("http://data.urbanaillinois.us/resource/2tkj-9e9d.json?$limit=50000")
urbanaBuildingGrades_data = urbanaBuildingGrades.json()

ourGrades = np.empty([len(urbanaBuildingGrades_data),1], dtype = "str")
a = 0
b = 0
c = 0
d = 0
f = 0
for k in range(len(urbanaBuildingGrades_data)):
    temporaryGrade = urbanaBuildingGrades_data[k]["grade"]
    ourGrades[k] = temporaryGrade[6]
    if ourGrades[k] == "A":
        a+=1
    elif ourGrades[k] == "B":
        b+=1
    elif ourGrades[k] == "C":
        c+=1
    elif ourGrades[k] == "D":
        d+=1
    elif ourGrades[k] == "F":
        f+=1
print("a",a,"b",b,"c",c,"d",d,"f",f)
a 153 b 1518 c 143 d 9 f 3

Next Checkpoint

The following are building inspections from the year of 2018.

import requests
import json
import numpy as np

urbanaBuildingGrades = requests.get("http://data.urbanaillinois.us/resource/2tkj-9e9d.json?$limit=50000")
urbanaBuildingGrades_data = urbanaBuildingGrades.json()

ourGrades = np.empty([len(urbanaBuildingGrades_data),1], dtype = "str")
a = 0
b = 0
c = 0
d = 0
f = 0
for k in range(len(urbanaBuildingGrades_data)):
    if urbanaBuildingGrades_data[k]["inspection_year"] == "2018":
        temporaryGrade = urbanaBuildingGrades_data[k]["grade"]
        ourGrades[k] = temporaryGrade[6]
        if ourGrades[k] == "A":
            a+=1
        elif ourGrades[k] == "B":
            b+=1
        elif ourGrades[k] == "C":
            c+=1
        elif ourGrades[k] == "D":
            d+=1
        elif ourGrades[k] == "F":
            f+=1
print("a",a,"b",b,"c",c,"d",d,"f",f)
a 24 b 177 c 20 d 0 f 0

EIEIO Algorithm Code

How does Old McDonald know what's on his farm?
  • EIEIO
Jokes aside, this algorithm was inspired by the Fog-of-War (FoW) functional proposed in my first paper, which sought to bias exploration to the areas proportionally to the time since they had last been visited.  In formulating a data-driven analog to the FoW functional, we extended the FoW with EIEIO so that heterogeneity in the nonstationary environment is taken into account.  Since the first paper on EIEIO has been published, we have published a chapter (Chapter 2) in Springer's Handling Uncertainty and Networked Structure in Robot Control in collaboration with Lucian Busoniu and Levente Tamas, which becomes available in 2016.

My code for EIEIO, as well as code by our compatriots for the textbook, is available here.

MATLAB Email Notification of Error

I've generated a MATLAB function called "sendToGmailFromGmailError" which automatically sends an email from a user-controlled gmail account to another gmail account.  The purpose of this program is to call the function in the MATLAB "try/catch" structure. In the event that the code in the "try" section of the code encounters an error, the "sendToGmailFromGmailError" function will notify yourself or an individual in your team that your simulation has encountered an error.  Moreover, the email contains the MATLAB error message so that you can know what went wrong.

MATLAB Email Notification of Completion

I've generated a MATLAB function called "sendToGmailFromGmail" which automatically sends an email from a user-controlled gmail account to another gmail account.  The purpose of this program is to call the function at the end of a simulation so as to notify yourself or an individual in your team that a simulation has finished running.  That way, when lengthy programs are executing students are not constrained to waiting for the program to finish.