From:                              route@monster.com

Sent:                               Friday, May 06, 2016 2:14 PM

To:                                   hg@apeironinc.com

Subject:                          Please review this candidate for: Cloud

 

This resume has been forwarded to you at the request of Monster User xapeix03

Srinivas B 

Last updated:  03/30/16

Job Title:  no specified

Company:  no specified

Rating:  Not Rated

Screening score:  no specified

Status:  Resume Received


Philadelphia, PA  19093
US

Quick View Links:

Resume Section

Summary Section

 

 

RESUME

  

Resume Headline: Srinivas B - Big Data Architect

Resume Value: 88w56ph62hnx7iz4   

  

 

Srinivas        Big Data DevOps Architect

Professional Summary

·              A Bachelor of Technology (Honors) from Indian Institute of Technology, Kharagpur, India with over 11 years of programming and software architecture, design and development experience with skills in cloud and distributed computing, data storage and analysis, testing and deployment of large scale software systems and applications with keen emphasis on efficiency, elegance, extensibility and scalability.

·              6+ years’ experience with the tools in the Hadoop Ecosystem including. Pig, Hive, Impala, Hadoop HDFS, Flume, HBase, Hadoop MapReduce, Sqoop, Flume, Oozie, ZooKeeper and Apache Hue.

·              Worked extensively with CDH3, CDH4 and CDH5. Used the Cloudera Manager and Ambari to administrate, maintain and troubleshoot a cluster.

·              Extensive experience with all types of databases from embedded to large scale data lakes including Derby, etcd, SQLite, MySQL, PostgreSQL, Cassandra, HBase, MongoDB and ElasticSearch.

·              Hands on experience with installing, configuring, administrating, debugging and troubleshooting Hadoop, ElasticSearch, Kafka, Spark, MySQL and Cassandra clusters.

·              Worked across multiple teams to deliver scalable multipurpose technology platforms for company wide initiatives.

·              Implemented Apache NiFi processors to move data processing and ETL to the edge of the network in order to reduce inefficiencies of data storage and transport and concentrate on just the data needed for analysis.

·              Setup, configured and implemented Nginx webserver using Docker containers for agile devops style deployments.

·              Created a custom Docker registry for the team to collaborate on Docker images using Git SCM.

·              Configured a 10 node Hadoop cluster using SequencIQ and OpenStack Heat.

·              Keen emphasis on automation in every aspect of the architecture from infrastructure orchestration to configuration management to application build and deployment.

·              Imported the Apache Mahout machine learning libraries to write advanced data mining and statistical procedures like filtering, clustering and classification to extend the capabilities of the MapReduce framework.

·              Gathered Java classes and methods, and Pig scripts from Apache Data Fu framework to implement some of the more complicated statistical procedures like quantiles, sampling, set and bag operations.

·              Extensive experience in JVM Performance tuning including tuning heap size, GCThresholds/Cycles Memory Management etc.

·              Extracted data from traditional databases like Teradata, SQL Server and Oracle 9g and SIEBEL into HDFS for processing using the Hadoop framework and return the processed results back to those databases for further analysis and reporting.

·              Loaded and extracted data from HDFS, wrote HIVE queries and Pig Scripts, defined Oozie workflows and stored and queried data from HBase using Apache Hue, the interactive web interface for the Hadoop framework.

·              Worked extensively with Cloud based tools like Amazon Redshift to warehouse, maintain and analyze data using traditional business intelligence tools.

·              Designed, developed and implemented an encryption/decryption program using AWS KMS’s envelopment encryption framework.

·              Deployed VMs in AWS, GCP, Azure, DigitalOcean and OpenStack using Terraform’s (by Hashicorp) providers.

·              Configured Linux network bridges, VxLANs and virtual switches/routers.

·              Setup and configured Vyos and OpenWRT virtual routers and Open vSwitch virtual switches to take full advantage of datacenter and rack-awareness so that traffic is routed among the worker nodes in the most efficient fashion. For e.g. two datanodes/nodemanagers on the same physical nodes can have extremely high data transfer rates between them.

·              Used br-utils packages (brctl) to modify configurations for virtual bridges for docker containers.

·              Installed updates and pre-requisites, configured services, formatted and mounted volumes, and changed system parameters (sysctl) using Ansible playbooks

·              Used Resilient Distributed Datasets(RDDs) to manipulate data, perform light analytics and create visualizations using the high performance distributed computing framework of Apache Spark

·              Expertise with analyzing, managing and reviewing and troubleshooting problems with Hadoop log files.

·              Implemented multiple regressions, hypothesis testing and p-value estimation using R, Python and Scala.

·              Implemented the breeze library for sparse computation is Scala.

·              Performed principal component analysis to find out independent relationships between regressors and the variable being estimated.

·              Involved in implementing the Presto querying engine for ad-hoc analytic queries against data in HDFS and Swift.

·              Involved in conducting benchmarks for various SQL frameworks including Spark SQL, Hive, Presto and Drill.

·              Configured Swift Proxy Servers to interface with CEPH to provide a scalable object store.

·              Configured Apache Sqoop to run SQL queries to import data from databases to HDFS.

·              Experience in importing, manipulating and exporting data using Sqoop from HDFS to RDBMS systems like MySQL and SQL Server especially where the relational data size was hundreds of gigabytes.

·              Extensive experience in writing Pig Scripts to analyze, summarize, aggregate, group and partition data.

·              Created UDFs to implement functionality not available in Pig. Used UDFs from Piggybank UDF Repository.

·              Setup and configured SQL Server 2014 for storing IIS 8 transaction logs and server configurations.

·              Performed price comparisons between various cloud platforms to determine the feasibility of running our application on public cloud vs private cloud.

·              Setup Ambari Metrics in embedded mode to collect aggregated metrics for all HDP components onto one large disk.

·              Highly experienced in writing HiveQL queries for both managed and external tables. Written multiple UDFs and Stored Procedures for regular maintenance and analysis.

·              Extended my skills in HIVE to Apache Impala which copies the relevant data to main memory (RAM) before running the query, thus enhancing the speed of execution by a factor of 100(In-memory data processing). Processed click data to find out response rate of email marketing campaigns using Impala.

·              Used Apache Flume to ingest data from various sources like log files and Relational Databases to HDFS using multiple sources and channels and wrote the data to sinks, one at a time. Managed the Flume instances across the project.

·              Good understanding of NoSQL Data bases and hands on work experience in writing applications on No SQL databases like Cassandra and Mongo DB.

·              Implemented pattern-matching and string search using case classes and regular expressions in Scala.

·              Excellent communication skills, interpersonal skills, problem solving skills, and a team player.

 

 

 

Technical Skills:

Domain

Technologies

Platform Infrastructure

OpenStack, Ansible, Terraform, Puppet, Chef, Mesos, Marathon, Chronos, Ceph, AWS S3, AWS EC2, AWS KMS, GCP Dataproc, GCP Dataflow

Tools in the Hadoop Ecosystem

HDFS, Map Reduce, Hive, Pig, Sqoop, Flume, Oozie, Avro, Datastax Cassandra, Apache Cassandra, Apache YARN, HBase, Zookeeper, Chukwa Cloudera CDH3, CDH4, Apache Whirr, Apache Big Top, Apache Solr, Apache Nutch, Apache Lucene, Apache Sentry, Apache Spark, Spark SQL, Spark MLLib Microsoft HD Insight, Hortonworks Ambari, AWS, Amazon EC2, S3, HiveQL, Pig Latin, Apache Drill, Apache Zeppelin. Google Cloud Container Engine, Kubernetes, Google Cloud Dataproc, Google Cloud Dataflow.

Languages

C/C++, Scala, JavaScript, JAQL, Clojure, Java, R Scripting Language, Python, T-SQL & PL/SQL

Analysis and Reporting Tools

Microsoft SSRS2012/2014, Microsoft SSAS 2014, Splunk, Tableau, Pentaho, Data Mining

Predictive Analytics

R, Stata, SPSS, MATLAB, Machine learning libraries in Mahout and Spark, H2O, Skytree Data Platform

Java Technologies and Frameworks

Struts Framework, Spring, Hibernate, J2EE, JDBC, Multi-threading, JSP, Servlets, JSF, SOAP, XML, XSLT, JSON, Message Pack and DTD. Scala based frameworks like Akka and Play.

Other Technologies

Maven, Microsoft Office, Ubuntu, RedHat Linux(RHEL), OpenStack Cloud Computing Framework, Jenkins GitHub, PLSQL Developer, Log4J, CVS. Git Stash and IntelliJIDEA.

Professional Work Experience

Comcast Corp, Philadelphia, PA

Big Data Architect

Jan 2015 - Present

Trying to understand customer pain points as and when they occur, and providing them with quick timely and professional redress goes a long way in ensuring customer loyalty, reducing churn and enhancing overall customer experience.  My role in this project was twofold. One is to design, build, test, benchmark and deploy a big data stack to ingest, store and provide an API for all customer event data. The other role is to provide a large-scale data lake to store, warehouse and analyze Comcast infrastructure telemetry.

Roles and Responsibilities

1)    Customer Timeline/Smart Connect

·  Responsible for design development, test and implementation of a 40 node 10TB Hadoop cluster as a datastore for storing and providing access to 50+ customer event datasets. 3 such clusters were built for DR purposes and read access was provided through VIPs on each DC fronted by a GSLB.

·  Setup and configured HTTPS on the Customer Timeline UI with SSL certificates issued by COMODO and integrated with companywide AD and Kerberos authentication systems.

·  Gathered requirements from 20+ teams including the amount of data, the kind of data, frequency of generation, ageing policy and sensitivity of data w.r.t. extant company policy.

·  Involved in rationalizing the various data sources and creating a streamlined, scalable and extensible process to ingest a variety of data sources based on a standardized schema and configuration variables.

·  Involved in modeling, creating and updating the schema for HBase to ingest the data.

·  Setup, configured and tuned Apache NiFi to collect data from the edge, transform it to meet schema requirements, and finally ingest into the HBase.

·  Configured and tuned NiFi w.r.t. number of threads, number of parallel processors, disk throughput and security (Access control and authorization).

·  Conducted performance tests on various combinations of RAM and CPU to determine optimal instance sizing for various components.

·  Wrote bash and Python scripts to provision and pre-configure OpenStack Nova instances and Cinder volumes for HDP install.

·  Set up REST gateways to enable HTTP based access to the data.

·  Wrote Python scripts to automate the performance testing/benchmarking for the Read API using the Locust framework.

·  Setup and configured topics with replication and partitions to store customer event data in Apache Kafka.

·  Tuned Kafka’s settings including message buffer size, timeout and retries to ensure that the message broker performed reliably and with low latency even with sudden spikes in data.

·  Set up and configured Kafka Mirror Maker to enable synchronization between enterprise and satellite Kafka.

·  Configured Kafka log-retention based on size of topic on disk to ensure maximum utilization of available disk.

·  Planned and implemented a 5 node ElasticSearch cluster with a Kibana dashboard to record metrics for ingestion data.

·  Planned and implemented active-active replication across 3 physical datacenters and seamless failover to ensure that the application is highly available.

·  Designed and implemented a MySQL distributed cluster of 10 nodes for analytic use cases (It is very inefficient to make queries conditioned on non-row keys to HBase).

·  Developed, tested and deployed an AWS Key Management System based encryption solution in Java to encrypt PII/PCI data and warehouse it to comply with company security policy.

·  Setup and configured a master build server with Maven and Jenkins for DM/CI and integrated it to GitHub Enterprise to enable automatic build on commit.

·  Wrote RFPs and performed the vendor evaluation/selection process for hiring a 12 member offshore operations team to provide 24/7 operations support for both Customer Timeline and Smart Connect.

·  Co-ordinated onboarding of the team, familiarizing them with the platform, training them to take over our infrastructure and applications. Worked with various SMEs in Comcast to ensure that all the relevant information is available to and understood by the offshore ops team.

·  Handed off infrastructure to relevant operational support teams.

·  Setup and configured IIS 8 webservers behind a VIP and GSLB and helped deploy a .NET webAPI

·  Setup and implemented a 5 node Consul cluster for controlling datacenter and prod/soak enable/disable for webAPI.

·  Created, configured and implemented a 3 node FluentD cluster for collecting real-time events and buffering them for ingest into ElasticSearch.

·  Setup, configured and deployed four 10 node ElasticSearch clusters and created indexes for ingesting real-time customer event data.

·  Used Java NIO libraries to write asynchronous code based on Futures and Callbacks to enable real-time ingest of customer event data.

·  Used Apache Flink 0.9.0 for a streaming analytics POC to stream data from a Kafka topic, filter it by status code, aggregate it by time and store the output in an Avro file on HDFS.

·  Setup and configured GossipingPropertyFileSnitch on a 6 node Cassandra cluster with 2 DCs where NetworkTopologyStrategy was used with varying replications to different workloads.

·  Designed and validated a CQL data model with 2 tables and 3 views to ingest WebAPI logs into Cassandra and view metrics about them on a Grafana dashboard and created Materialized Views to query non-primary key columns.

2)    Infrastructure Analytics

·  Was responsible the design, implementation, benchmarking and operational ownership for three clusters 50,120 and 183 nodes with 4 TB, 30 TB and 340TB of effective HDFS storage.

·  Extensively used devops automation tools like Terraform and Ansible to orchestrate and configure VMs for HDP install.

·  Conducted a POC to integrate the Ambari and OpenStack REST APIs to enable one click deploys of Hadoop clusters.

·  Planned and implemented processes for periodic ingest of infrastructure telemetry into Swift, OpenStack object store. The data in approx. 2 TB per day of CDN logs, IP video logs and CDR logs.

·  Involved in capacity planning, hardware estimation and allocation for the various big data initiatives.

·  Installed, and configured Hortonworks Data Platform 2.2.x, 2.3.x and 2.4.x on the three clusters respectively.

·  Performed express upgrade of the 120 node cluster from HDP 2.2.6.0 to HDP 2.3.4.0 in one day.

·  Designed, architected and built a 30 node cluster with all ephemeral disks to ensure node locality for data-science workloads where there is a large amount of iterative processing and high I/O.

·  Integrated OpenStack Swift into HDP to enable querying of Swift containers using Hadoop clients and running MapReduce/Spark jobs against them.

·  Developed Scala code leveraging Spark SQL to execute JOINs and aggregations on raw logs of up to 3 months of data.

·  Created encryption zones for certain directories to store sensitive data on HDFS. It gets transparently encrypted and decrypted as it is written and read.

·  Configured special high bandwidth, high IOPS SolidFire disks for Hadoop components like Namenodes, Zookeepers, JournalNodes and Kafka broker disks.

·  Setup and configured Flafka (Flume and Kafka) to read from a topic as messages came through and ingest them into HDFS in Avro format.

·  Implemented FairScheduler with pre-emption on YARN to accommodate for varying workload with creating quotas based inefficiencies.

·  Configured encrypted shuffle/sort in Hadoop using Kerberos to store SSL keys.

·  Developed Python and Shell scripts to perform aggregations on simulation outputs.

·  On boarded new users for the clusters including providing them new users, helping them understand the cluster and its capacity and ensuring that the given use case was the right fit for the cluster.

·  Tuned Linux kernel properties including swap, file descriptors, TCP timeouts/buffers and XFS block sizes to improve performance of cluster.

·  Tuned Spark and MapReduce jobs w.r.t. container memory utilization, input file sizes, compression, intermediate data placement and compression, and writing of final output to HDFS/Swift.

·  Implemented and productionized the Swift S3 API so that AWS CLI, Java and Python clients can be used to work with Swift containers.

·  Integrated periodic Hadoop jobs with Swift’s auth tokens so that a large number of files can be accessed without making one call per file to the Keystone auth endpoint.

·  Used the performance numbers from an ensemble of workloads to re-design the sizes of VMs to rectify shortfalls in memory and CPU.

·  Installed and deployed H20 machine learning framework on the Hadoop cluster.

·  Installed and configured RStudio in the same environment to ensure that R jobs could be run on Spark that was available on the cluster.

·  Setup cron jobs to rebalance the cluster once a week to ensure evenly distributed data.

·  Trained and transferred ownership of the cluster to the operations team.

·  Conducted a POC on using Netty ByteBuf and async channels to improve I/O performance for encrypting and enriching IP telemetry.

·  Setup and configured Tachyon, an in-memory filesystem with both HDFS and Swift as underFSes to realize a tiered-storage architecture for SLA based self-service analytics.

·  Setup processes in NiFi for converting JSON data into Avro and ingesting the same into HBase, Cassandra and ElasticSearch.

·  Used R and Scala’s MLLib to forecast cloud infrastructure demand using Multivariate Linear Regression and logistic regression. Amount of data per team, growth rate of data and ageing policies (in number of days) were the independent variables.

Environment: Hortonworks Data Platform 2.2.x, 2.3.x, 2.4.x, Cloudera Manager, Cloudera Director, Apache Hadoop 2.7.1, Spark 1.6.0, Scala 2.11, Flink, Avro, H2O, Ambari 2.2.1, Terraform, Ansible 2.0, Consul, Zookeeper, cloud-init, Bash, Python 2&3, Java 8, Hive, Pig, HBase, Cassandra, ElasticSearch, SQLite, MySQL.

American Express, New York, NY  

Sr. Cassandra Developer/Administrator

May 2012 – Dec 2014

Consumer banking (Credit cards/Prepaid cards) is a stable but important vertical in any banking corporation. My project in American Express is to use the vast amounts of consumer data to understand consumer behavior so that products and services can be better tailored to suit consumer needs. My role was to generate aggregation and groupings of consumer banking data to create new and innovative financial products.

Roles and Responsibilities:

·              Responsible for building scalable distributed data solutions using Datastax Cassandra.

·              Involved in business requirement gathering and proof of concept creation.

·              Created data models in CQL for customer data.

·              Involved in Hardware installation and capacity planning for cluster setup.

·              Involved in the hardware decisions like CPU, RAM and disk types and quantities.

·              Used the Spark – Cassandra Connector to load data to and from Cassandra.

·              Setup and configured nginx as a reverse proxy to ingest data from external sources into Cassandra through the custom designed web service.

·              Worked with the Data architect and the Linux admin team to set up, configure, initialize and troubleshoot an experimental cluster of 12 nodes with 3 TB of RAM and 60 TB of disk space.

·              Ran many performance tests using the Cassandra-stress tool in order to measure and improve the read and write performance of the cluster.

·              Wrote Java code to query Cassandra using both the QueryBuilder API and the PreparedStatements API.

·              Wrote and modified YAML scripts to set the configuration properties like node addresses, replication factors, client storage space, memTable size and flush times etc. 

·              Used the Datastax Opscenter for maintenance operations and Keyspace and table management.

·              Loaded and transformed large sets of structured, semi structured and unstructured data in various formats like text, zip, XML, YAML and JSON.

·              Created data-models for customer data using the Cassandra Query Language.

·              Used collections like lists, sets and maps to create data models highly optimized for reads and writes.

·              Created User defined types to store specialized data structures in Cassandra.

·              Developed PIG UDFs for manipulating the data and extracting useful information according to Business Requirements and implemented them using the Datastax Pig functionality.

·              Responsible for creating Hive tables based on business requirements

·              Implemented advanced procedures like text analytics and processing using the in-memory computing capabilities like Spark.

·              Used Scala case classes to implement pattern matching using Regular expressions.

·              Enhanced and optimized production Spark code to aggregate, group and run data mining tasks using the Spark framework.

·              Implemented the clustering algorithms in Mahout to cluster consumer by location of purchase and general category of purchase in order to create specialized and targeted credit and foreign exchange products.

·              Implemented a distributed messaging queue to integrate with Cassandra using Apache Kafka and ZooKeeper.

·              Used R and RStudio to run various statistical procedures on both time-series and cross-sectional data like regression, density estimation, polynomial estimation, smoothing splines and hypothesis testing.

·              Involved in a POC to implement a failsafe distributed data storage and computation system using Apache YARN.

·              Used Clojure scripting in web development for real-time dashboards.

·              Involved in the implementation of a POC using the OpenStack Cloud Computing Framework.

·              Tuned and recorded performance of Cassandra clusters by altering the JVM parameters like –Xmx and    –Xms. Changed garbage collection cycles to place them in tune with backups/compactions so as to mitigate disk contention.

·              Queried and analyzed data from Datastax Cassandra for quick searching, sorting and grouping.

·              Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for efficient data access.

·              Participated in NoSQL database integration and implementation.

·              Exported the analyzed data into relational databases using Sqoop for visualization and to generate reports.

·              Gathered the business requirements from the Business Partners and Subject Matter Experts like Data Scientists.

Environment: Apache Hadoop 2.2.0, Cloudera 4.5, HDP 1.3, Apache Kafka, Cassandra, MapReduce, Spark, Hive 0.12, Pig 0.11, HBase, Linux, XML.

Dell Inc. – Round Rock, Texas

Hadoop Engineer/Developer

Oct 2010 – April 2012

Dell has a large base of enterprise customers, mainly companies purchasing servers, storage and networking solutions. Our team was responsible for storing the clickstream data on Dell’s enterprise product websites. We used Hadoop to store and process the data and run both predictive and descriptive analytics on the collected data.

Roles and Responsibilities:

·              Configured the Hadoop Cluster in Local (Standalone), Pseudo‐Distributed, Fully‐Distributed Mode

·              Responsible for building scalable distributed data solutions using Hadoop.

·              Imported data using Sqoop to load data from MySQL to HDFS on regular basis from various sources.

·              Wrote HIVE queries for aggregating the data and extracting useful information sorted by volume and grouped by vendor and product.

·              Worked closely with the functional team to gather and understand business requirements determine feasibility to and to convert them to technical tasks in the Design Documents.

·              Worked closely with business team to gather requirements and add new support features.

·              Implemented Partitioning, Dynamic Partitions and Buckets in HIVE for more efficient data access.

·              Involved in NoSQL (Datastax Cassandra) database design, integration and implementation.

·              Wrote queries to create, alter, insert and delete elements from lists, sets and maps in Datastax Cassandra.

·              Created indices for conditioned search in Datastax Cassandra.

·              Implemented Custom JOINS to create tables containing the records of Items or vendors blacklisted for defaulting payments suing Spark SQL.

·              Created use cases and test cases for each of the queries before shipping the final production code to the validation of support and maintenance team.

·              Involved in creating a POC for light analytics using Clojure scripts.

·              Wrote and implemented Hadoop MapReduce programs in Ruby using Hadoop Streaming.

·              Exported the analyzed data into Teradata using Sqoop for visualization and to generate reports to be further processed by business intelligence tools.

·              Wrote a technical paper and created slideshow outlining the project and showing how Cassandra can be potentially used to improve performance.

·              Used the machine learning libraries of Mahout to perform advanced statistical procedures like clustering and classification to determine the probability of payment default.

·              Ran the logistic regression in Python and Scala using the in-memory distributed computing framework of Apache Spark.

·              Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.

Environment: Apache Hadoop 2.3.0, Hive 0.12, Horton Works Data Platform, Teradata, Mahout, Cassandra, Ubuntu

Connecticut State Dept., East Hartford, CT

Hadoop Developer   

Oct 2009 – Sep 2010

The education department which oversees the Connecticut public school system has, over the years, collected a large amount of data on student activities and performance like test scores in math, reading and writing, attendance, physical and extra-curricular activities. The Hadoop framework is well-suited to extract useful information from this data in order to improve the curriculum, train teachers and get a better allocation of the system’s scarce resources. There were more than half a million students in the system and the total data size was approximately 750 GB. I was involved with the team that analyzed, aggregated and attempted to find patterns in this data. The results were communicated to the Office of the Commissioner of Education to aid them in decision making.

Roles and Responsibilities:

·              Installed and configured Apache Hadoop to test the maintenance of log files in Hadoop cluster.

·              Installed and configured Hive, Pig, Sqoop, and Oozie on the Hadoop cluster.

·              Installed Oozie Workflow engine to run multiple Hive and Pig Jobs.

·              Setup and benchmarked Hadoop/HBase clusters for internal use.

·              Extracted data from databases like SQL Server and Oracle 9g into HDFS for processing using Pig and Hive.

·              Performed optimization on Pig scripts and Hive queries increase efficiency and add new features to existing code.

·              Performed statistical analysis using Splunk.

·              Developed Java MapReduce programs for the analysis of sample log file stored in cluster.

·              Developed Simple to complex Map/Reduce Jobs using Hive and Pig.

·              Developed Map Reduce Programs for data analysis and data cleaning.

·              Stored and retrieved data from data-warehouses using Amazon Redshift.

·              Developed PIG Latin scripts for the analysis of semi structured data.

·              Used Hive and created Hive tables and involved in data loading and writing Hive UDFs.

·              Used Sqoop to import data into HDFS and Hive from other data systems.

·              Generated aggregations and groups and visualizations using Tableau.

·              Continuous monitoring and managing the Hadoop cluster using Cloudera Manager.

·              Migration of ETL processes from Oracle to Hive to test the easy data manipulation.

·              Conducted some unit testing for the development team within the sandbox environment.

·              Developed Hive queries to process the data for visualizing and reporting.

Environment: Apache Hadoop, Cloudera Manager, CDH2, CDH3 CentOS, Java, MapReduce, Apache Hama, Eclipse Indigo, Hive, Sqoop, Oozie and SQL.

InfoTech India Pvt Ltd, Pune, India

Sr. Software Test Engineer

June 2008 – Sept 2009

Project Management System (PMS): It is a web based Project Management Application, client/company management. PMS systems provides functionalities like add some users, add a company, add a project, add a task. It helps to track different activities like tasks, contacts, project progress, ticketing etc. User can have ability to watch forum topics or even whole forums.

Roles and Responsibilities:

·              Tested web based Project Management Application designed to facilitate monitoring different project activities such as tasks, contacts, project progress, ticketing etc.

·              Analyzed System Specifications, designed, developed and executed Test Cases

·              Performed Extensive Manual Testing for all the functionalities in the application

·              Involved in various types of process evaluations during each phase of the software development life

·              cycle including, review, walk through and hands-on system testing

·              Performed task allocation and prepared Traceability Matrix for Test Case Status, Peer Review Sheets,

·              Bug Tacking Report, and Status update

·              Executed test cases and submitted bugs and tracked those using the Test Director

Environment: Windows 98/2000/XP, PHP, SQL server 2000, Internet Explorer, Mozilla Firefox, IIS, MS-Office

ICICI Finance, Mumbai, India

Java Software Developer

          July 2007 ‐ May 2008

ICICI Finance uses Real Time Application (RCA) for its loans and finance division. This Multi‐tier architecture application facilitates settlement between various merchant Systems and Transaction Processing units at ICICI.

Roles and Responsibilities:

·              Involved in gathering and analyzing system requirements.

·              Designed the application using Front Controller, Service Controller, MVC, Factory, Data Access Object, and Service Locator.

·              Developed the web application using Struts Framework.

·              Developed entire application based on STRUTS framework and configured struts config.xml, web.xml.

·              Created tile definitions, struts‐config files and resource bundles using Struts framework.

·              Implemented validation framework for creation of validation.xml and used validation rules.xml.

·              Developed Classes in Eclipse for Java using various APIs.

·              Designed, developed and deployed necessary stored procedures, Functions, views in Oracle using TOAD.

·              Developed JUnit test cases.

Environment: UNIX Shell scripting, Core Java, Struts, Eclipse, J2EE, JBoss Application Server and Oracle, JSP, JavaScript, JDBC, Servlets, Unified Modeling Language, Toad, JUnit.

Fujitsu India Pvt Ltd, Gurgaon, India

Java/ J2EE Developer

July 2005 – June 2007

Fujitsu was creating a customer care website for its Asia-Pacific customers for its memory products. I was involved with the Java and Web development team to execute this project.

Roles and Responsibilities:

·              Involved in System Analysis and Design methodology as well as Object Oriented Design and development using OOA/OOD methodology to capture and model business requirements.

·              Proficient in doing Object Oriented Design using UML‐Rational Rose.

·              Created Technical Design Documentation (TDD) based on the Business Specifications.

·              Created JSP pages with Struts Tags and JSTL.

·              Developed UI using HTML, JavaScript, CSS and JSP for interactive cross browser functionality and complex user interface.

·              Implemented the web‐based application following the MVC II architecture using Struts framework.

·              Used XML DOM API for parsing XML.

·              Developed Scripts for automation of productions tasks using Perl, UNIX scripts.

·              Used ANT for compilation and building JAR, WAR and EAR files.

·              Used JUnit for the unit testing of various modules.

·              Project coordination with other Development teams, System managers and web master and developed good working environment.

Environment: Java, J2EE, JSP, JavaScript, MVC, Servlet, Struts, PL/SQL, XML, UML, JUnit, ANT, Perl, UNIX.

Certifications:

IBM Big Data University training certificate in Hadoop, Hive, Pig, MapReduce and HDFS Data Transfer

Datastax certified Cassandra Developer/Administrator

Education:

Bachelor of Technology (Honors) in Electronics and Electrical Communication Engineering – Indian Institute of Technology, Kharagpur, India

 

 



Experience

BACK TO TOP

 

Job Title

Company

Experience

Big Data Architect

Renee Systems Inc

- Present

 

Additional Info

BACK TO TOP

 

Current Career Level:

Manager (Manager/Supervisor of Staff)

Work Status:

US - I am authorized to work in this country for my present employer only.

 

 

Target Job:

Target Job Title:

Big Data Architect

 

Target Company:

Company Size:

Occupation:

Project/Program Management

·         IT Project Management

 

Target Locations:

Selected Locations:

US-PA-Philadelphia