How to Read Text from Image in PHP

If you want to read/extract all the text from an image or scanned documents in Php then OCR is used for it. OCR (Optical Character Recognition) is used for any form of image based data extraction.
In this article I’ll explain how to read text from an image using OCR. I’ll use “Tesseract” OCR for this purpose.

Step 1: Install Tesseract OCR

First, you need to install the Tesseract OCR on your server. Tesseract is a powerful open-source OCR engine that works on both Linux and Windows.

Linux Installation:

sudo apt update sudo apt install tesseract-ocr sudo apt install libtesseract-dev

Windows Installation:

1. Download the Tesseract installer.
2. Install Tesseract on your system.
3. Add the Tesseract installation path to your system’s PATH environment variable.

Step 2: Write code in PHP to Read Text from an Image

Once Tesseract is installed, you can write a simple PHP script to read text from an image. Here’s the PHP code:
<?php // Path to the Tesseract executable (adjust it according to your installation) $tesseractPath = '/usr/bin/tesseract'; // Linux/Ubuntu // $tesseractPath = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'; // Windows // Path to the image file $imagePath = 'image.png'; // Path to the output file (Tesseract will output the result here) $outputFile = 'output'; // Command to run Tesseract $command = $tesseractPath . ' ' . escapeshellarg($imagePath) . ' ' . escapeshellarg($outputFile); // Execute the command exec($command, $output, $returnVar); // Check if Tesseract was executed successfully if ($returnVar == 0) { // Read the content of the output file $outputText = file_get_contents($outputFile . '.txt'); echo "Extracted text:\n" . $outputText; } else { echo "Error executing Tesseract."; } ?>

Tesseract Path: The path to Tesseract is specified based on your system. For Linux, it’s usually /usr/bin/tesseract. On Windows, it might be something like C:\\Program Files\\Tesseract-OCR\\tesseract.exe.
Image Path: This is where you set the path to the image you want to process. In this example, the image is named “image.png”.
Output File: The $outputFile variable is for the “.txt” file where the extracted data form the image will be saved to.
Executing the Command: The PHP exec() function runs the Tesseract command. It extracts the data from the image and then stores it in the output file.
Reading the Output: After running the command, the extracted text is read from the .txt file and displayed.

Step 3: Example Image

You can try this with a sample image (e.g., image.png) placed in the same directory as your PHP script. You can adjust the file path as necessary.

Step 4: Running the Script

To run this script: 1. Upload your image and PHP script to your server.
2. Run the script from your browser or command line.
3. The extracted text will be printed on the page, or it will display any errors if thrown..

There you go, It is pretty easy to install OCR and integrate in your PHP web applications.

Leave a Comment