In this post I will show you a step by step guide to deploy the Document Processing Extension in docker on Ubuntu.

I used the following Ubuntu 22.04.3 LTS release. But at the end you can use any other Linux distribution that supports docker.

At first we should install the required components to our Ubuntu machine if they are not already installed.

  • OpenSSL
    • sudo apt install openssl
  • Python3
    • sudo apt install python3
    • sudo apt install python3-distutils
  • Docker with swarm enabled
    • Add Docker’s official GPG key
      • sudo apt-get update
      • sudo apt-get install ca-certificates curl gnupg
      • sudo install -m 0755 -d /etc/apt/keyrings
      • curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg –dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
      • sudo chmod a+r /etc/apt/keyrings/docker.gpg
    • Add the repository to Apt sources
      • echo “deb [arch=$(dpkg –print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable” | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
      • sudo apt-get update
    • Now you can install docker
      • sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
    • Create a swarm
      • run “docker swarm init”

Now we can continue with the DPE deployment. You need to download the latest deployment scripts from https://github.com/IBM/cloud-pak/blob/master/repo/case/ibm-dp-extension/1.0.1/ibm-dp-extension-1.0.1.tgz

Extract the file and copy the dpedeploy-23.0.1-IF003.tar.xz from …/ibm-dp-extension/inventory/adpOperator/files/deploy to a different location or extract it in the deploy folder

Now run the dpedeploy to pull the DPE image.

Accept the license and enter the required credentials. As username you use “cp” and as password you must enter your secret key that needs to be created via https://myibm.ibm.com/products-services/containerlibrary

Enter a custom docker swarm stack or leave it as default ibm_dpe. You can use the buit-in Postgre database or connect to a remote Postgre or DB2 database. For demo or testing purpose you can use the built-in database. But for production I recommend to use a remote database system.

If you plan ti use ther OCR Engine 2 please refer to the system requirements for DPE at https://www.ibm.com/docs/en/datacap/9.1.9?topic=extension-installing-document-processing

Check the stack status with “docker stack services ibm_dpe”

When all services are running you can access the DPE UI via the URL that was provided by the deployment. In my case https://dpe

If you plan to use the built-in Postgre database you well also need to perform the following steps.

  • docker cp $(docker ps -f name=ibm_dpe_spbackend –quiet):/data-org/db_sample_data/pg_imports.tar.xz ./

Now you have extracted the sample data from the spbackend service container into you local files system

Copy the pg_imports.tar.xz into the folder ibm_assets folder and extract it there.

  • cp pg_imports.tar.xz ibm_assets/managed_db_scripts/PG/
  • tar xvf pg_imports.tar.xz

Verify if the “imports” folder is created.

Now you can start to create you DPE projects.