Paulito Palmes

Unlike in Online RL where agents need to interact with real environment, Offline RL works similar to a typical machine learning workflow. Given a dataset, Offline RL processes data extracting state, action, reward, and terminal columns to optimize the policy Q.  By wrapping up offline RL into the AutoMLPipeline workflow, it becomes trivial to search for the optimal preprocessing elements and their combinations to improve Offline RL optimal policy using symbolic workflow manipulation.

As part of AutoMLPipeline workflow, it becomes trivial to search which preprocessing elements and their combinations  provide the best policy Q by cross-validation where the dataset is split into training and testing several times to get the average accumulated discounted rewards (return) of a given policy Q. This talk will demonstrate how to setup the Offline RL pipeline to preprocess the dataset and learn the optimal  policy Q and incorporate some parallel search strategy to get the optimal workflow.

Paulito Palmes

Talks:

Wrapping Up Offline RL as part of AutoMLPipeline Workflow

Platinum sponsors

Gold sponsors

Silver sponsors

Bronze sponsors

Academic partners

Local partners

Fiscal Sponsor