KTP-OCR in Python using Pytesseract
KTP-OCR is an open source python package that attempts to create a production grade KTP extractor. The aim of the package is to extract as much information as possible yet retain the integrity of the information. For example, we will upload the photo first like this:
And after u upload the photo the system will read the image, and the result would be like this:
- PROVINSI DAERAH ISTIMEWA YOGYAKARTA KABUPATEN SLEMAN
- NIK : 34711140209790001
- Nama :RIYANTO. SE T
- empat/Tgl Lahir : GROBOGAN. 02-09-1979
- Jenis Kelamin : LAKI-LAKI
- Gol Darah : 0
- Alamat PRM PURI DOMAS D-3. SEMPU RTRW 1001 1024
- Kel/Desa : WEDOMARTANI! Kecamatan : NGEMPLAK
- Agama “ISLAM
- Status Bean KAWIN SLEMAN
- Pekerjaan : PEDAGANG 05-06-2012
- Kewarganegaraan: WNI HI
- Berlaku Hingga :02-09-2017 NIA
The main part of OCR (optical character recognition) was python-tesseract, python-tesseract is an optical character recognition (OCR) tool for python. That is, it will recognize and “read” the text embedded in images. lets go to the code:
first of all u need is run this code:
!sudo apt install tesseract-ocr !pip install pytesseract! !sudo apt-get install tesseract-ocr-ind
we use google colab for compiling and running the, code so i think it will got easier, after that import the library that we need:
import cv2 import numpy as np import pytesseract import matplotlib.pyplot as plt from PIL import Image
after importing the library u can upload the photo with code:
from google.colab import files img = files.upload()
after that, just run this code and u will got the result:
#read img gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) ## (2) Threshold th, threshed = cv2.threshold(gray, 127, 255, cv2.THRESH_TRUNC) ## (3) Detect result = pytesseract.image_to_string((threshed), lang="ind") ## (5) Normalize for word in result.split("\n"): if "”—" in word: word = word.replace("”—", ":") #normalize NIK if "NIK" in word: nik_char = word.split() if "D" in word: word = word.replace("D", "0") if "?" in word: word = word.replace("?", "7") print(word)
Hi Pak Firhan
Bagaimana mengatasi gambar KTP yang blur yah, apakah ada metode lain?
Terima Kasih