Translate functional requirements into technical design.
Participate in all aspects of Big Data solution delivery life cycle including analysis,design, development, testing, production deployment, and support.
Develop standardized practices for delivering new products and capabilities using BigData technologies, including data acquisition, transformation, and analysis.
Define and develop client specific best practices around data management within a Hadoop environment.
Recommend design alternatives for data ingestion, processing and provisioning layers
Design and develop data ingestion programs to process large data sets in Batch mode using HIVE, Pig and Sqoop technologies
Develop data ingestion programs to ingest realtime data from LIVE sources using Apache Kafka, Spark Streaming and related technologies
Work in large teams developing and delivering solutions to support large scale data management platforms following Agile methodology
Monitor data ingestion processes end to end and optimize the overall data processing lead times
Develop tests cenarios and test scripts to validate data loaded in Hadoop platform
1+ years of hands-on experience using Hadoop (preferably Hadoop 2 with YARN), MapReduce, Pig, Hive,Sqoop, and HBase
Strong understanding of Hadoop ecosystem including setting up Hadoop cluster with knowledge on cluster sizing,monitoring, storage design and encryption at rest and motion.
Experience in scheduling Hadoop jobs using oozie workflow and Falcon
Proven experience in implementing Security in Hadoop ecosystem using Kerberos, Sentry, Knox and Ranger.
2+ years of strong experience in object-oriented programming through JAVA, including optimizing memory usage and JVM configuration in distributed programming environments
Worked on developing REST API's using standard frameworks
Exposure to Hadoop's distributed column-oriented database like HBase.
Strong UNIX operating system concepts and shell scripting knowledge
Production Experience in Apache Spark using SparkSQL and Spark Streaming or Apache Storm
Exposure to different NoSQL databases within Hadoop ecosystem
Exposure topublic, private, and hybrid cloud platforms such as AWS, Azure and Google