Posts tagged 'Machine Learning'

Template matching with OpenCV

For structured (bounding box based) text extraction, it becomes imperative that the received image and target image are aligned properly and to scale. OpenCV is a great image processing library that has a ton of features.

To align source and template images, following steps are required.

  • First convert images to …

Improving Quality of Text Extraction

I have been working on ML projects that require image preprocessing and text extraction. To improve the quality of text extraction, there are many preprocessing steps that we need to do, they are elicited below. We use OpenCV for doing the preprocessing and tesseract-ocr for text extraction.

Image preprocessing

  • Rescaling …

Apache Zeppelin Notebooks

Apache Zeppelin provides a Web-UI where you can iteratively build spark scripts in Scala, Python, etc. (It also provides autocomplete support), run Sparkql queries against Hive or other store and visualize the results from the query or spark dataframes. This is somewhat akin to what Ipython notebooks do for python …

Machine learning with Apache Spark, Scala and Hive

Apache spark has an advanced DAG execution engine and supports in memory computation. In memory computation combined with DAG execution leads to a far better performance than running map reduce jobs. In this post, I will show an example of using Linear regression with Apache Spark. The dataset is NYC-Yellow …

Why you should use square root of Gini Index

In this post I will explain why you should use square root of Gini index while building decision tree classification models. In decision tress, We know that at every node we need to choose a feature that provides the best split i.e. the feature that reduces the child nodes' …