From: route@monster.com
Sent: Monday, September 28, 2015 1:00 PM
To: hg@apeironinc.com
Subject: Please review this candidate for: Talend
This resume has been forwarded to
you at the request of Monster User xapeix03
|
|||||||
|
|||||||
|
|
|
||||||
|
||||||
|
Benjamin Kim 520 S. Burnside Ave. #7D, Los Angeles, CA 90036 (818)
635-2900 Education University
of California, Los Angeles, Los Angeles,
CA
1994 Bachelor
of Science, Microbiology and Molecular Genetics Certifications Cloudera
Admin Technical Skills Data
Systems:
Cloudera CDH [Flume-NG,
HBase, HDFS, Hive, Hue, Impala, MapReduce, YARN, Oozie, Sqoop, Zookeeper,
Spark], Amazon AWS [S3, EC2, Elastic MapReduce], Cloud Compute [Storage,
BigQuery]. Applications:
Pentaho, Talend,
Puppet, and Ganglia. OS:
Mac OS X, RedHat
Enterprise Linux, CentOS Linux, and Ubuntu Linux. Scripting:
Bash, Hive QL, PL/SQL,
and Scala. Professional Summary Big Data
Manager with 13+ years of Data Platform experience including: · Advocating close to real-time processes
using the tools available within CDH, such as Flume-NG, HBase, HBase
Coprocessors, Impala, and the upcoming Spark integration to automate online
feedback mechanisms. · Blueprinting, documenting, and
supervising the implementation of the big data infrastructure using open
source technologies utilizing Cloudera’s ecosystem of software included in
CDH4 such as Cloudera Manager, Hadoop, Hue, Beeswax, Oozie, Pig, Hive,
Impala, and HBase to process hundreds of Terabytes of information. · Experience with project management,
requirements documentation, client-relations, ETL workflow tools, and
software development lifecycle methodologies including waterfall, agile,
scrum, and kanban. · Strong understanding of Amazon AWS as an
IaaS for S3 storage, Hadoop/Hive for analytics on EMR, and PostgreSQL for the
dashboard in Microstrategy. · Responsible for overseeing the batch
data workflow processes created using Spring Batch, Hive QL against Hive and
Impala, Java against HBase, and direct Shell commands; coordinate efforts
with Operations and Infrastructure to verify that the cluster nodes are
configured and performing in accordance to user expectations. · Troubleshooting any issues in the Hadoop
ecosystem by triaging any and all issue tickets; consulted and assisted in
researching both configuration and implementation related issues, which
included code reviews of Hive QL and Java; and forwarding all remaining
blockers to the experts both within the company, online, or to Cloudera. · Led and facilitated a proof of concept
ETL solution to populate HDFS using the open source data integration tool
from Talend. Experience Adconion, Santa Monica,
CA
09/2012 – Present Big Data
Manager · Pushing the company into transitioning
to close to real-time processes using the tools available within CDH, such as
Flume-NG, HBase, HBase Coprocessors, Impala, and the upcoming Spark
integration; this system can unify and simplify all data pipelines and enable
the company to automate a feedback mechanism, which will configure campaign
parameters continuously, using online analytics. · Blueprinted, documented, and supervised
the implementation of the big data infrastructure using open source
technologies using Cloudera’s ecosystem of software included in CDH4.1 such
as Cloudera Manager, Hadoop, Hue, Beeswax, Oozie, Pig, Hive, Impala, and
HBase to process hundreds of Terabytes of information. · Responsible for overseeing the batch
data workflow processes created using Spring Batch, Hive QL against Hive and
Impala, Java against HBase, and direct Shell commands; coordinate efforts
with Operations and Infrastructure to verify that the cluster nodes are
configured and performing in accordance to user expectations. · Oversaw the troubleshooting and
escalation of any issues during the installation and configuration process;
consulted and assisted in researching hardware setup related issues; and
forwarded all remaining blocking problems to the experts both in online
groups, or directly to Cloudera. · Facilitated a proof of concept for an
open source data integration solution from Talend and evaluated alternate
solutions such as Python-based Celery and Java-based Spring Batch. · Participated in and documented the
details of the data ingestion system from the high level framework to the
technical implementation and consequently helped design the inclusion of
reporting workflows and machine learning behavior algorithms in a modularized
fashion. · Coordinated vendor meetings and
presentations between the team and Talend for their Enterprise Data
Integration Suite, Cloudera for their Cloudera Manager Tool, and Radoop for
their Data Analytics, Mining, and Machine Learning tool. · Pushed for an initiative to move to
real-time computational analysis using Storm in addition to Kafka as data
messaging bus. Odesus, Los Angeles,
CA
06/2012 – 07/2012 Client: Fox Digital Media, Playa
Vista, CA Big Data
Architect
(contract) · Investigated data extraction methods and
solutions to funnel all the data exports from the multitude of third party
analytics vendors into a centralized data repository. It was decided to use
bash with some Python, cron with Jenkins to schedule, coordinate, and monitor
the extractions. · Due to the nature of the project being
as a POC plus budgetary constraints, Amazon AWS was commissioned as an IaaS
for S3 storage, Hadoop/Hbase/Hive for analytics on EMR, and MySQL for the
dashboard on RDS. · Overviewed the data workflow processes
contained in the Java MapReduce, Hive QL, and Python code. · Google Analytics were seen as a more
capable, cost effective solution, but legal and political concerns halted any
further trials. · Other trial efforts using Google’s
services were Cloud Storage as a data repository and BigQuery for analytics. · Touched on converting, changing the
current SDLC methodology from agile to kanban. Level
Studios, El Segundo,
CA
02/2012 – 06/2012 Client: Activision, Santa Monica,
CA Data
Solutions Architect
(contract) · Analyzed, designed, and implemented an
automated method of data extraction for reports, ad campaigns, and analytics
to measure effectiveness. · The data sources involved were a
combination of Hadoop, Oracle, and Infobright. · Initial investigation into using the
Talend DI tool to simplify and automate the process was implemented but was
ultimately abandoned. · Bash scripts and cron jobs were
sufficient for their needs. Accenture, El Segundo,
CA
2011 – 2012 Data
Platform Architect · Planned, designed, and oversaw the
implementation of the big data infrastructure using open source technologies
such as Hadoop, Flume, Sqoop, Pig, Hive, and HBase to process petabytes of
information. · Led the facilitating a proof of concept
ETL solution to populate HDFS using the open source data integration tool
from Pentaho. · Oversaw the troubleshooting and
escalation of any issues in Hadoop, HBase, Hive, and Flume by triaging
Map/Reduce or Flume Source/Sink coding tickets; consulted and assisted in
researching configuration related issues; and forwarding all remaining
blocking problems to the experts both within the company, online, or to
Cloudera. |
|
|
||||||||
|
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
|
Languages: |
Languages |
Proficiency Level |
|
English |
Fluent |
|
|
Italian |
Beginner |
|
|
Japanese |
Intermediate |
|
|
Korean |
Advanced |
|
|
Spanish |
Intermediate |
|
|
|