Responsible for implementation and ongoing administration of Hadoop infrastructure.
Aligning with the systems engineering team to propose and deploy new hardware and software environments required for Hadoop and to expand existing environments.
Working with data delivery teams to setup new Hadoop users. This job includes setting up Linux users, setting up Kerberos principals and testing HDFS, Hive, Pig and MapReduce access for the new users.
Cluster maintenance as well as creation and removal of nodes using tools like Cloudera Manager Enterprise and other tools.
Performance tuning of Hadoop clusters and Hadoop MapReduce routines.
Screen Hadoop cluster job performances and capacity planning.
Monitor Hadoop cluster connectivity and security.
Manage and review Hadoop log files.
File system management and monitoring.
HDFS support and maintenance.
Diligently teaming with the infrastructure, network, database, application.
and business intelligence teams to guarantee high data quality and
Collaborating with application teams to install operating system and Hadoop updates, patches, version upgrades when required.
Point of Contact for Vendor escalation.
At least two years experience managing and supporting large scale Production Hadoop environments in any of the Hadoop distributions (Apache, Teradata, Hortonworks, Cloudera, MapR, IBM BigInsights, Pivotal HD)
At least three years experience in a Scripting Language (Linux, SQL, Python). Should be proficient in shell scripting.
At least one years experience in Hadoop Monitoring tools (Nagios, Ganglia, Cloudera Manager and Ambari etc).
Product knowledge of Hadoop distributions such as Cloudera, Hortonworks & Greenplum pivotal or MapR
Experience of several of the following is highly desirable:
o High availability, BAR and DR strategies and principles.
o Hadoop software installation and upgrades
o Proficiency in Hive internals (including HCatalog), SQOOP, Pig, Oozie and Flume/Kafka.
o Development or administration on NoSQL technologies like Hbase, MongoDB, Cassandra, Accumulo, etc.
o Development or administration on Web or cloud platforms like Amazon S3, EC2, Redshift, Rackspace, OpenShift, etc.
o Development/scripting experience on Configuration management and provisioning tools e.g. Puppet, Chef
o Web/Application Server & SOA administration (Tomcat, JBoss, etc.)
o Development, Implementation or deployment experience on the Hadoop ecosystem (HDFS, MapReduce, Hive, Hbase)
o Analysis and optimization of workloads, performance monitoring and tuning, and automation.
o Addressing challenges of query execution across a distributed database platform on modern hardware architectures
o Articulating and discussing the principles of performance tuning, workload management and/or capacity planning
o Defining standards, developing and Implementing best practices to manage and support data platforms
Experience of any one of the following will be an added advantage:
o Hadoop integration with large scale distributed data platforms like Teradata, Teradata Aster, Vertica, Greenplum, Netezza, DB2, Oracle, etc.
o Java, Python, Perl, Ruby, C or Web-related development
o Knowledge of Business Intelligence and/or Data Integration (ETL) operations delivery techniques, processes and methodologies
A good understanding of systems analysis and design.
Ability to communicate effectively with business users about technical topics.
Great problem solving skills.