In this article I’ll explain how to read text from an image using OCR. I’ll use “Tesseract” OCR for this purpose.
Step 1: Install Tesseract OCR
First, you need to install the Tesseract OCR on your server. Tesseract is a powerful open-source OCR engine that works on both Linux and Windows.Linux Installation:
sudo apt update sudo apt install tesseract-ocr sudo apt install libtesseract-dev
Windows Installation:
1. Download the Tesseract installer.2. Install Tesseract on your system.
3. Add the Tesseract installation path to your system’s PATH environment variable.
Step 2: Write code in PHP to Read Text from an Image
Once Tesseract is installed, you can write a simple PHP script to read text from an image. Here’s the PHP code:<?php // Path to the Tesseract executable (adjust it according to your installation) $tesseractPath = '/usr/bin/tesseract'; // Linux/Ubuntu // $tesseractPath = 'C:\\Program Files\\Tesseract-OCR\\tesseract.exe'; // Windows // Path to the image file $imagePath = 'image.png'; // Path to the output file (Tesseract will output the result here) $outputFile = 'output'; // Command to run Tesseract $command = $tesseractPath . ' ' . escapeshellarg($imagePath) . ' ' . escapeshellarg($outputFile); // Execute the command exec($command, $output, $returnVar); // Check if Tesseract was executed successfully if ($returnVar == 0) { // Read the content of the output file $outputText = file_get_contents($outputFile . '.txt'); echo "Extracted text:\n" . $outputText; } else { echo "Error executing Tesseract."; } ?>
Tesseract Path: The path to Tesseract is specified based on your system. For Linux, it’s usually /usr/bin/tesseract. On Windows, it might be something like C:\\Program Files\\Tesseract-OCR\\tesseract.exe.
Image Path: This is where you set the path to the image you want to process. In this example, the image is named “image.png”.
Output File: The $outputFile variable is for the “.txt” file where the extracted data form the image will be saved to.
Executing the Command: The PHP exec() function runs the Tesseract command. It extracts the data from the image and then stores it in the output file.
Reading the Output: After running the command, the extracted text is read from the .txt file and displayed.
Step 3: Example Image
You can try this with a sample image (e.g., image.png) placed in the same directory as your PHP script. You can adjust the file path as necessary.Step 4: Running the Script
To run this script: 1. Upload your image and PHP script to your server.2. Run the script from your browser or command line.
3. The extracted text will be printed on the page, or it will display any errors if thrown..
There you go, It is pretty easy to install OCR and integrate in your PHP web applications.