OCR email attachments and store them on SharePoint
Microsoft Flow is a great tool to automate a lot of you daily, repetitive tasks. I stumbled on such a task at one of my clients. They receive an email with an attachment in a shared mailbox. One of the employees downloads the attachment and stores in a SharePoint folder. Nothing really special and easy to create with Microsoft Flow. But wait, there is more. After talking with the manager, he noticed that it would be great if the document is OCR’d before they are stored.
Now it gets interesting, can we download the attachment, send it through an OCR solution, get the file back and store it in SharePoint? Yes, we can. Most online OCR solution are quite expensive, but I came across ElasticOCR and they charge reasonable prices and have a Microsoft Flow connector :D.
The process is not really fast, it may take up to 15 minutes or more sometimes to process the document, but hey, it’s automated so you don’t have to wait on it. Below I will walk you through setting up the flow. I added some Try – Catch methods to handle some of the errors.
small note: all images are clickable 😉
- Step 1 – Start a Free Trial to get a license key
- Step 2 – Create a new flow
- Step 3 – Process each attachment
- Step 4 – Check if the attachment is a PDF file
- Step 5 – Create the OCR Job
- Step 6 – Retrieve the OCR’d File
- Step 7 – Handle the completed job
- Step 8 – Catch the errors
Step 1 – Start a Free Trial to get a license key
Before we can get started we need a license and app id. To get this you can register for a 14-day trial. Go to https://portal.elasticocr.com/trial and register for the trial. You will receive an email with the details in your mailbox.
Step 2 – Create a new flow
When you got the necessary license details we can start with a new empty flow. The first trigger we are going to add is the Outlook trigger When a new email arrives. Select the approiate mailbox and folder. If an email arrives in the mailbox we want to get the email based on the message id and it must include an attachment.
Step 3 – Process each attachment
An email can have multiple attachments. So we add a new step and select More > Add an apply to each step. Select the output Attachments from the previous step
Step 4 – Check if the attachment is a PDF file
We want to process only PDF files, so we are going to check every attachment if it’s a PDF file before we continue. Add a Condition and fill in the following details:
- Content-type from the Get Email
Now we continue only on the If Yes side. If the attachment file is not a PDF file, then we simply do nothing.
Step 5 – Create the OCR Job
Add an Action in the If Yes side and search for ElasticOCR as the connector. Then select the action Create a Job with a file. It will ask to fill in a name for the connection (make one up) the App ID and License ID.
After you filled in the license information the action screen will update and we can file in the data we need for the flow.
- Filename – Select Name (attachment name)
- File Data – Click on See More in the Get Email part and select Content
- Metadata – leave empty
Step 6 – Retrieve the OCR’d File
The next step is to retrieve the file. It may take some time to process it, so we use a do while loop to check every 5 minutes if the job is completed. But we also want to catch the error if something goes wrong. If the job did take to long or simply failed. This way we can notify the users that something went wrong.
To create this we are going to use a scope. This bundles a group of actions and based on the outcome we can continue with the necessary steps.
- Click on More and select Scope
- Inside the Scope, click again on More and select Do Until
- For value in the Do Until we pick status of Create a job with a file and we want the status to contain available
- Create an Action and search for ElasticOCR and pick Retrieve a job
- Fill in the job ID
- Add another action and search Delay and pick Schedule – Delay
- Set the count on 5 and the unit on Minute
We just created a loop that will retrieve the job status every 5 minutes until the job status contains Available. Then the loop (do until) will be exited
Step 7 – Handle the completed job
The Do Until loop will either complete or fail in a timeout. We add a new scope after the first scope to handle the completed files. Select More (inside the If yes statement) and select scope. Click on the menu (the three little dots inside the brown bar of the Scope 2) and select Configure run after.
We only want to run this scope after the first scope succeeded. So we tick Is Succesful and then done. (In the same menu you can also rename Scope 2 to something like Download file.)
Step 7.1 – Download the OCR’d file
- Add a new action inside the Scope 2 and search for HTTP. Choose the HTTP – HTTP action.
- For the method, we select GET, because we want to get/retrieve the file from an URL.
- For the URI you select See more in the Retrieve Job section and select Download URL
Update: You can now also use the Download a job action from Elastic OCR itself!
- Create a new action below the HTTP action and select SharePoint as the connector
- Select Create File (SharePoint will sign in if you haven’t used SharePoint before in Flow)
- Add the site address and select the folder where you want to store the file
- For the File Name, we select the Filename from the ElasticOCR Retrieve job action
- We get the content of the file from the HTTP action. Select See More in the HTTP action and choose Body
Step 7.3 – Send an email notification
You can send an email notification that the jobs are completed to the user. Add a SharePoint Path in the body of the email to point the user in the right direction where the file is stored.
Step 7.4 – Complete the Job
When you stored the file on SharePoint the last step is to complete the job. Add an action below the Send email and search for ElasticOCR. Select the Job Id as value to mark it as complete.
Step 8 – Catch the errors
The flow part for processing and retrieving the file is complete. But it’s always a good idea to do some error handling in your flows. The most like error, in this case, is a timeout in the Do Until loop. We can add a new scope and catch that timeout and send an email that the file isn’t processed.
Step 8.1 – Add a parallel branch
- Hover over the arrow between the two scopes, a small plus mark will appear
- Select Add Parallel Branch > Add Scope. A new scope will appear next to the Scope 2
- Go to the menu of the new scope (scope 3) and select configure run after an choose has a timeout
This way the scope will run if the Do until job takes to long.
Step 8.2 – Add an action
When we have a timeout we can simply send an email to notify the user that the job failed.
I hope the steps helped you getting started with Flow and the use of an OCR tool. If you have any question or suggestion, just let me know.