|

KTP-OCR in Python using Pytesseract

KTP-OCR is an open source python package that attempts to create a production grade KTP extractor. The aim of the package is to extract as much information as possible yet retain the integrity of the information. For example, we will upload the photo first like this:

And after u upload the photo the system will read the image, and the result would be like this:

  • PROVINSI DAERAH ISTIMEWA YOGYAKARTA KABUPATEN SLEMAN
  • NIK : 34711140209790001
  • Nama :RIYANTO. SE T
  • empat/Tgl Lahir : GROBOGAN. 02-09-1979
  • Jenis Kelamin : LAKI-LAKI
  • Gol Darah : 0
  • Alamat PRM PURI DOMAS D-3. SEMPU RTRW 1001 1024
  • Kel/Desa : WEDOMARTANI! Kecamatan : NGEMPLAK
  • Agama “ISLAM
  • Status Bean KAWIN SLEMAN
  • Pekerjaan : PEDAGANG 05-06-2012
  • Kewarganegaraan: WNI HI
  • Berlaku Hingga :02-09-2017 NIA

The main part of OCR (optical character recognition) was python-tesseract, python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. lets go to the code:

first of all u need is run this code:

!sudo apt install tesseract-ocr
!pip install pytesseract!
!sudo apt-get install tesseract-ocr-ind

we use google colab for compiling and running the, code so i think it will got easier, after that import the library that we need:

import cv2
import numpy as np
import pytesseract
import matplotlib.pyplot as plt
from PIL import Image

after importing the library u can upload the photo with code:

from google.colab import files
img = files.upload()

after that, just run this code and u will got the result:

#read img
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

## (2) Threshold
th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC)

## (3) Detect
result = pytesseract.image_to_string((threshed), lang="ind")

## (5) Normalize
for word in result.split("\n"):
  if "”—" in word:
    word = word.replace("”—", ":")
  
  #normalize NIK
  if "NIK" in word:
    nik_char = word.split()
    if "D" in word:
      word = word.replace("D", "0")
    if "?" in word:
      word = word.replace("?", "7") 
  
  print(word)

Similar Posts

One Comment

Comments are closed.