
Vikram Vepuri
- Hadoop / Spark Developer
- Duluth, GA
- Member Since Jun 14, 2023
Vikram Vepuri
Summary:
· Around 8 years of Professional experience in IT Industry, involved in Developing, Implementing, Configuring Hadoop ecosystem components on Linux environment, Development and maintenance of various applications using Java, J2EE, developing strategic methods for deploying Big data technologies to efficiently solve Big Data processing requirement.
· 4 years of experience as Hadoop Developer with sound knowledge in Hadoop ecosystem technologies.
· Hands on experience in developing and deploying enterprise based applications using major components in Hadoop ecosystem like Hadoop 2.x, YARN, Hive, Pig, Map Reduce, HBase, Flume, Scoop, Spark, Strom, Kafka, Oozie and Zookeeper.
· Experience in installation, configuration, supporting and managing Hadoop Clusters using Hortonworks and Cloudera (CDH3, CDH4) distributions on Amazon wed services(AWS).
· Excellent Programming skills at a higher level of abstraction using Scala and Spark.
· Good understanding in processing of real-time data using Spark.
· Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata into HDFS using Sqoop.
· Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive.
· Involved in NOSQL databases like HBase, Apache Cassandra in implementing and integration.
· Managing and scheduling batch Jobs on a Hadoop Cluster using Oozie.
· Experience in managing and reviewing Hadoop Log files.
· Used Zookeeper to provide coordination services to the cluster.
· Experienced using Sqoop to import data into HDFS from RDBMS and vice-versa.
· Experience and understanding in Spark and Storm.
· Hands on dealing with log files to extract data and to copy into HDFS using flume.
· Experience in analysing data using Hive, Pig Latin, and custom MR programs in Java.
· Hands on experience in Analysis, Design, Coding and Testing phases of Software Development Life Cycle (SDLC).
· Experience in multiple database and tools, SQL analytical functions, Oracle PL/SQL server and DB2.
· Experience in working with Amazon Web Services EC2 instance and S3 buckets.
· Worked on different file formats like Avro, Parquet, RC file format, JSON format.
· Involved in writing Python scripts for building disaster recovery process for current processing data into data center by providing current static location.
· Experience in ingesting data into Cassandra and consuming the ingested data from Cassandra to HDFS.
· Used Apache Nifi for loading PDF Documents from Microsoft SharePoint to HDFS.
· Used Avro serialization technique to serialize data for handling schema evolution.
· Experience in designing and coding web applications using Core Java & Web Technologies- JSP, Servlets and JDBC, full Understanding of utilizing J2EE technology Stack, including Java related frameworks like Spring, ORM Frameworks(Hibernate).
· Experience in designing the User Interfaces using HTML, CSS, JavaScript and JSP.
· Developed web application in open source java framework Spring. Utilized Spring MVC framework.
· Experienced front-end development using EXT-JS, JQuery, JavaScript, HTML, Ajax and CSS.
· Have good interpersonal, communicational skills, strong problem solving skills, explore and adapt to new technologies with ease and a good team member.
Professional Experience:
NCR Corporation, Duluth, GA Mar 2017 to Present
Hadoop/Spark Developer
Responsibilities:
· Experience in writing Sqoop scripts to import and export data from RDBMS into HDFS, HIVE and handled incremental loading on the customer and transaction information data dynamically.
· Involved in creating Hive tables, loading with data and writing hive queries that will run internally in map reduce way.
· Used Hive to analyse the partitioned and bucketed data and compute various metrics for reporting on the dashboard.
· Worked on Data Serialization formats for converting Complex objects into sequence bits by using AVRO, PARQUET, CSV formats.
· Leveraged Hive queries to create ORC tables.
· Created Views from Hive Tables on top of data residing in Data Lake.
· Worked on setting up Kafka for streaming data and monitoring for the Kafka Cluster.
· Involved in creation and designing of data ingest pipelines using technologies such as Apache Strom and Kafka.
· Used Oozie workflow engine to manage interdependent Hadoop jobs and to automate several types of Hadoop jobs such as Java MapReduce, Hive and Sqoop as well as system specific job.
· Implemented Kerberos Security Authentication protocol for existing cluster
· Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
· Used Spark API over Cloudera Hadoop YARN to perform analytics on data in Hive.
· Exploring with the Spark improving the performance and optimization of the existing algorithms in Hadoop using Spark Context, Spark -SQL, Data Frame, Pair RDD's, Spark YARN.
· Experienced with batch processing of data sources using Apache Spark, Elastic search.
· Experienced with NoSQL databases like HBase, MongoDB and Cassandra.
· Developed Kafka producer and consumers, Spark and Hadoop MapReduce jobs.
· Import the data from different sources like HDFS/Hbase into Spark RDD.
· Involved in converting Map Reduce programs into Spark transformations using Spark RDD's on Scala.
· Wrote complex SQL to pull data from the Teradata EDW and create Ad-Hoc reports for key business personnel within the organization.
· Integrated Hive server 2 with Tableau using Horton Works Hive ODBC driver, for auto generation of Hive queries for non-technical business user.
· Experience in Oozie and workflow scheduler to manage Hadoop jobs with control flows.
· Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
Environment: Hadoop, HDFS, Spark, MapReduce, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Linux Shell Scripting, Hortonworks.
Blue Cross Blue Shield, Detroit, MI Aug 2015 to Feb 2017
Hadoop Developer
Responsibilities:
· Developing MapReduce programs in java for data extraction, transformation and aggregation for multiple file formats including XML, JSON and other file formats.
· Experience in transferring Streaming data, data from different data sources into HDFS and NoSQL databases using Apache Flume.
· Experienced in working with various kinds of data sources such as Teradata and Oracle. Successfully loaded files to HDFS from Teradata, and load loaded from hdfs to hive.
· Involved in loading and transforming large sets of structured, semi structured and unstructured data from relational databases into HDFS using Sqoop imports.
· Experience in importing and exporting data using Scoop from HDFS to Relational Database and vice-versa.
· Used Hive optimization techniques during joins and best practices in writing Hive scripts.
· Used the JSON and Avro SerDe's for serialization and de-serialization packaged with Hive to parse the contents of streamed log data and implemented Hive custom UDF's.
· Developed a process for the Batch ingestion of CSV Files, Sqoop from different sources and also generating views on the data source using Shell Scripting and Python
· Design and Develop Pig Latin scripts and Pig command line transformations for data joins and custom processing of MapReduce outputs.
· Worked extensively on Hadoop Components such as HDFS, Job Tracker, Task Tracker, Name Node, Data Node, YARN, Spark and Map Reduce programming.
· Used Kafka, Flume for building robust and fault tolerant data Ingestion pipeline for transporting streaming web log data into HDFS
· Experience in writing hive UDF’s for the requirements and handle different schema’s and xml data.
· Used RDD's to perform transformation on datasets as well as to perform actions like count, reduce.
· Experienced in using Kafka as a data pipeline between JMS and Spark Streaming Applications.
· Load the data into Spark RDD and do in memory data Computation to generate the Output response.
· Implemented advanced procedures like text analytics and processing, using the in-memory computing capabilities of Spark.
· Used Spark-Streaming APIs to perform necessary transformations and actions on the fly for building the common learner data model which gets the data from Kafka in near real time and Persists into Cassandra.
· Written multiple UDF programs in Java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other compressed file formats
· Hands on experience in exporting the results into relational databases using Sqoop for visualization and to generate reports for the BI team
Environment: Hadoop, HDFS, Spark, MapReduce, Pig, Hive, Sqoop, Kafka, HBase, Oozie, Flume, Scala, Python, Java, SQL Scripting and Linux Shell Scripting, Cloudera, Cloudera Manager.
Sprint, Overland Park, KS September 2014 to July 2015
Hadoop developer
Responsibilities:
· Designed the control tables / Job tables in HBase and MySql. Created external Hive tables on HBase.
· Experience in developing batch processing framework to ingest data into HDFS, Hive and HBase.
· Worked on Hive and Pig extensively to analyse network data.
· Automation of data pulls from SQL Server to Hadoop eco system via SQOOP.
· Performance Tuning Hive and Pig Job's performance parameters along with native map-reduce parameters to avoid excessive disk spills, enabled temp file compression between jobs in the data pipeline to handle production size data in a multi-tenant cluster environment.
· Designed workflows and coordinators in Oozie to automate and parallelize Hive and Pig jobs on Apache Hadoop environment by Hortonworks (HDP 2.2)
· Hands on writing complex Hive queries involving external dynamic partitioned on date Hive Tables which stores rolling window time-period user viewing history.
· Delivered Hadoop migration strategy, roadmap and technology fitment.
· Designed & implemented HBase tables, Hive UDFs the data with complete ownership.
· Worked collaboratively with different teams to smoothly slide the project to production.
· Built the process automation of various jobs using OOZIE.
· Used Apache Kafka for importing real time network log data into HDFS.
· POCs on moving existing Hive / Pig Latin jobs to Spark.
· Deployed and configured Flume agents to stream log events into HDFS for analysis.
· Load the data into Hive tables using Hive HQL's along with deduplication and Windowing.
· Generated ad-hoc reports using Hive to validate customer viewing history and debug issues in production.
· Worked on HCatalog which allows PIG and Map Reduce to take advantage of the SerDE data format transformation definitions that we write for HIVE.
· Worked on configuring Tableau to Hive data and also on using Spark as execution engine for Tableau instead of MapReduce.
· Worked with multiple Input Formats such as Text File, Key Value, Sequence File input format.
· Installed and configured various components of Hadoop ecosystem and maintained their integrity.
· Planning for production cluster hardware and software installation on production cluster and communicating with multiple teams to get it done.
· Designed, configured and managed the backup and disaster recovery for HDFS data.
· Experience in collecting metrics for Hadoop clusters using Ambari.
· Worked with BI teams in generating the reports in Tableau.
· Worked on loading source data to HDFS by writing java code.
· All small files will be merged and loaded into HDFS using java code and tracking history related to merge files are maintained in HBASE.
Environment: HDFS, MapReduce, Spark, Pig, Hive, HBase, Pig, Flume, Sqoop and Flume.
Charles Schwab ,San Francisco, CA July 2012 to June 2014
Hadoop Developer
Responsibilities:
· Experience in administration, installing, upgrading and managing CDH3, Pig, Hive & Hbase.
· Architecture and implementation of the Product Platform as well as all data transfer, storage and Processing from Data Center and to Hadoop File Systems.
· Experienced in defining job flows.
· Implemented CDH3 Hadoop cluster on CentOS.
· Worked on installing cluster, commissioning & decommissioning of datanode, namenode recovery, capacity planning, and slots configuration.
· Wrote Custom Map Reduce Scripts for Data Processing in Java.
· Importing and exporting data into HDFS and Hive using Sqoop.
· Responsible to manage data coming from different sources.
· Supported MapReduce Programs those are running on the cluster.
· Involved in loading data from UNIX file system to HDFS.
· Created Hive tables to store data into HDFS, loading data and writing hive queries which will run internally in map reduce way.
· Used Flume to Channel data from different sources to HDFS.
· Created HBase tables to store variable data formats of PII data coming from different portfolios.
· Implemented best income logic using Pig scripts. Wrote custom Pig UDF to analyse data.
· Load and transform large sets of structured, semi structured and unstructured data.
· Cluster coordination services through Zookeeper.
· Exported the analysed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
Environment: Hadoop 1.0.0, MapReduce, Hive, HBase, Flume, Pig, Zookeeper, Java, ETL, SQL, CentOS
Euro clear, Hyderabad, India June 2009 to April 2012
Java Developer
Responsibilities:
· Involved in coding of JSP pages for the presentation of data on the View layer in MVC architecture.
· Used J2EE design patterns like Factory Methods, MVC, and Singleton Pattern that made modules and code more organized, flexible and readable for future upgrades.
· Worked with JavaScript to perform client side form validations.
· Used Struts tag libraries as well as Struts tile framework.
· Used JDBC to access Database with Oracle thin driver of Type-3 for application optimization and efficiency.
· Client side validation done using JavaScript.
· Used Data Access Object to make application more flexible to future and legacy databases.
· Actively involved in tuning SQL queries for better performance.
· Worked with XML to store and read exception messages through DOM.
· Wrote generic functions to call Oracle stored procedures, triggers, functions.
Environment: JDK, J2EE, UML, Servlet, JSP, JDBC, Struts, XHTML, JavaScript, MVC, XML, XML, Schema, Tomcat, Eclipse.
Education:
· Bachelor of Engineering, Chennai, India.
SRM University
Additional Information
Technical Skills:
· Big Data: Apache Hadoop, HDFS, Map Reduce, Hive, PIG, OOZIE, SQOOP, Spark, Cloudera manager and hortonworks.
· Database: MYSQL, Oracle, SQL Server, Hbase
· IDEs: Eclipse, NetBeans
· Languages C, Java, PIG LATIN, UNIX shell scripting, Python
· Scripting Languages: HTML, CSS, JavaScript, DHTML, XML, JQuery
· Web Technologies: HTML, XML, JavaScript, J query
· Web/Application Servers: Apache Tomcat, WebLog