Contact

Skype: prof.tuhin
Ring: +91 - 925.067.4214
E-mail: dr@tuhin.ai

Data Engineering

Data is growing exponentially at a tremendous speed. With constant change in Business needs, Organizations need an Agile and Flexible Data Foundation which will always enable and fast-track its Business.

Data Extraction

Data Harmonization

Data Governance

Data Visualization

Define, Design and Develop Enterprise Grade Data Platform to deal with Data of any size, shape and speed.

We help with Big Data and Analytics services to strengthen Business Foundation and explore new possibilities of accelerated growth.

  • Discovery & Assessment of Clients’ Short-term Objective and Long-term Vision
  • Architecture & Design Recommendation and help in Implementation
  • Recommendation, Evaluation and Selection of Technology, Tool, Product
  • Driving Proof of Concept to better judge and take informed decision for client specific business cases
  • Focused approach to address Business Problems
  • Listen, Consult & Implementation
  • Manage the entire ecosystem of Big Data & Analytics

We help our customers to discover Data to accelerate their business growth in existing and new dimensions. We bring capabilities and expertise to build Data Platform.

Data Strategy
We enable our customers to create a cutting-edge Big Data and Cloud Platform, Agile Analytics and Transformative Data Culture.
Data Architecture
Harness a robust, secure and flexible Big Data Architecture that promotes use of high quality, relevant and accessible data.
Data Management
Build Data Lake, Warehouse, Lakehouse on modern platforms for structured, semi-structured and unstructured data ensuring high level of Data Quality.
Data Framework
Create Big Data Framework which includes Data Policies, Data Practices and Enterprise Process to manage the Data life cycle of an Organization .
Data Instrumentation
We help our client to cut through the clutter of software & tools and choose the best fit for their business needs. Thus save money and time .
Data Migration
Focus on Modernization through innovative Data Migration process from Legacy to Cloud based Big Data Platform.
Data Governance
Establish a formal Data Governance Program with guidance on Composition, Awareness and Charter to create a highly capable Governance Team.

Data Engineering Framework

Framework Components (Some market leaders)

Cloud Platform

AWS

Azure

Google Cloud

DataBricks

Cloud Foundry

Quobole

Openstack

Data Integration

Amazon API Gateway

AWS Lambda

Spark Streaming

Talend

Marmaray

Apache Goblin

Data Storage

Apache Hive

Redshift

Cassandra

HBase

MongoDB

Neo4J

Data Processing

AWS Glue

Spark

Flink

ElasticSearch

Pentaho

OpenRefine

DataCleaner

Analytics & Reporting

Tableau

AWS Quicksight

Splunk

Looker

Spotfire

Revolution R

Zeppelin

Principle of Technology Choice

Separation of storage and computation layer.
Low Cost Scalable and Highly Available Big Data Storage System.
Scalable & Cost-Effective Compute that can meet on-demand spikes and steady growth.
Interoperable Query/Access Engine that works with variety of Data Access Technologies.
Storage and Access should support Multi Tenancy while sharing same Infrastructure.
Technologies that abides by Data Security, Data Privacy and Data Governance Mechanisms.
Choice of File Formats for best performance and low disk space.

Data Architecture

  • Data warehouses have a long history in serving business intelligence applications. However they were expensive for handling unstructured data, semi-structured data.
  • Data lakes then emerged to handle semi-structured and unstructured data in a variety of formats on cheap storage. Big drawback was that they do not support transactions.
  • Data lakehouse is a new paradigm that addresses the drawbacks of both data lakes and data warehouses and combines their best features.
  • Lakehouses are propelled by implementing similar data structures and data management systems of data warehouse, on top of a low-cost storage that are key to data lakes.

Data Quality

  • Completeness - is all necessary data available and accessible?
  • Consistency — How consistent is the data across different systems that holds instances of the data
  • Validity — Measures whether a value conforms to a pre-defined standard.
  • Accuracy —How correctly and accurately the data present itself?
  • Uniqueness — A discrete measure of duplication of identified data items
  • Timeliness — Timeliness is a measure of time between when data is EXPECTED against when data is made AVAILABLE.

Data Privacy and Security