
Srini Hadoop
- Hadoop Developer
- Duluth, GA
- Member Since Jun 14, 2023
Srini
Summary of Qualifications:
§ Around 7 years of Professional experience in IT Industry, involved in Developing, Implementing, configuring, testing Hadoop ecosystem components and maintenance of various web based applications .
§ Around 4 years of IT Industry Experience ranging from Data Warehousing Big Data tools like Hadoop and Spark.
§ Developed and Delivered Big Data solution for 4 different clients using tools like Hadoop, Spark, Hive, HBase, SQL, PIG, Talend, Kafka, Flume,Sqoop,Oozie and Azkaban.
§ Good at conceptualizing and building solutions quickly and recently Developed a Data Lake using sub-pub Architecture. Developed a pipeline using Java,Kafka and Python to load data from a JMS server to Hive with automatic ingestions and quality audits of the data to the RAW layer of the Data Lake.
§ Hands on experience in hadoop eco systems such as HDFS, MapReduce,Yarn, Pig, Hive, SparkSQL,Hbase, Oozie, Zookeeper, sqoop, flume, impala, kafka and strom, YARN.
§ Excellent understanding/knowledge of Hadoop Distributed system architecture and design principles.
§ Experience in analyzing data using Hive, Pig Latin, HBase and custom Map Reduce program.
§ Excellent knowledge with distributed storages (HDFS) and distributed processings (Mapreduce, Yarn)
§ Hands on experience in developing MapReduce programs according to the requirements
§ Hands on experience in performing data cleaning, pre processing using Java and Talend data preperation tool
§ Have a hands-on experience on Data Warehousing experience on Extraction, Transformation and Loading (ETL) processes using Talend Open Studio for Data Integration and Big Data
§ Expertise in optimizing traffic across network using Combiners, joining multiple schema datasets using Join and organizing data using Practitioners and Buckets
§ Expertise with NoSQL databases such as Hbase.
§ expertize in using Talend tool for ETL purposes (data migration, cleansing.)
§ Expertise in working with different kind of data files such XML, JSON and Databases
§ Hands on experience in Importing and exporting data from different databases like MySQL, Oracle, Teradata , DB2 into HDFS using Sqoop
§ Experience in working with different file formats and compression techniques in Hadoop
§ Used Pig as ETL tool to do transformations, event joins, filter and some pre-aggregation
§ Extensive hold over Hive and Pig core functionality by writing Pig Latin UDFs in Java and used various UDFs from Piggybanks and other sources
§ Good knowledge of database connectivity (JDBC) for databases like Oracle, DB2, SQL Server, MySQL.
§ Worked extensively on different Hadoop distributions like CDH and Hortonworks
§ Strong experience working with real time streaming applications and batch style large scale distributed computing applications using tools like Spark Streaming, Kafka, Flume, MapReduce, Hive etc.,
§ Expertise in extending Hive and Pig core functionalities by writing custom User Defined Functions (UDF)
§ Proficient in using various IDEs like Eclipse, My Eclipse and NetBeans
§ Hands on experience in job workflow scheduling and monitoring tools like Oozie and Zookeeper
§ Sound knowledge in programming Spark using Scala
§ Excellent Programming skills at a higher level of abstraction using Scala and Spark
§ Good understanding in processing of real-time data using Spark
§ Hands on experience in build management tool Maven and Ant
§ Experience in all phases of software development life cycle
§ Extensive experience in using Oracle, SQL Server, DB2 and MySQL databases.
§ Extensive experience in utilizing Agile methodologies for software development
§ Followed agile methodology for development process
§ Developed the system by following the agile methodology.
§ Support development, testing, and operations teams during new system deployments
§ Very good knowledge in SAP BI (ETL) and Data warehouse tools.
§ Have good experience creating real time data streaming solutions using Apache Spark/Spark Streaming/Apache Storm, Kafka and Flume.
§ Involved in Hadoop testing.
§ Hand on experience on UNIX scripting.
§ Responsible for designing and implementing ETL process using Talend to load data from different sources.
§ Experience with working of cloud configuration in Amazon web services AWS
§ Good Knowledge in web Services
§ Have done reporting by using SSRS, SAPBO, and Tableau.
§ Knowledge on SSIS, SSRS, SSAS, SAP BI.
Technical Skills:
Big Data technologies/Hadoop Ecosystem |
HDFS, MapReduce, Yarn, Pig, Hive, HBase, Sqoop, Flume, Oozie, Zookeeper, Cassandra, Kafka, Scala, Spark and storm |
Programming Languages |
Java, SQL, PL/SQL, python, Scala, Linux shell Scripting. |
Web Services |
MVC, SOAP, REST |
RDBMS |
Oracle 10g, MySQL, SQL server, DB2. |
No SQL |
HBase, Cassandra. |
Data Bases |
Oracle, Tera data, DB2, MS Azure. |
ETL tools |
Talend, SSIS |
Tools Used |
Eclipse, Putty, Pentaho, MS Office, Crystal Reports, Falcon and Ranger |
Development Strategies |
Agile, Water-Fall and Test Driven |
Professional Experience:
NCR Corporation – Duluth, GA November 2016 – Present
Role: Hadoop Developer
Description: Client business is based on the financial industry, in production and supporting transactions with the ATM’s all over the world. Which is the subsidiary of the Telecom Industry giant AT&T.
Responsibilities:
§ Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
§ Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
§ Developed code in reading multiple data formats on HDFS using PySpark
§ Worked in converting Hive/SQL queries into Spark transformations using Spark RDDs, Python and Scala.
§ Developed multiple POCs using PySpark and deployed on the Yarn cluster, compared the performance of Spark, with Hive and SQL/Teradata.
§ Experienced in analyzing the SQL scripts and designed the solution to implement using PySpark
§ Utilized capabilities of Tableau such as Data extracts, Data blending, Forecasting, Dashboard actions and table calculations to build dashboards.
§ Developed extremely complex dashboards using Tableau table calculations, quick filters, context filters, hierarchies, parameters and action filters.
§ Published customized interactive reports and dashboards, report scheduling using Tableau server.
Environment: Hadoop Frameworks, HDP 2.5.3, Kafka, Hive, Sqoop, Python, Spark, Shell Scripting, Oracle SONIC
JMS, Java 8.0 & 7.0, Eclipse, Tableau.
GEORGIA PACIFIC-ATLANTA September 2015 – November 2016
Role: Hadoop Developer
Description: Georgia-Pacific LLC is an American pulp and paper company based in Atlanta, Georgia, and is one of the world's leading manufacturers and distributors forest products , building products and related chemicals.
Responsibilities:
§ Key role in team of 3 in Migrating the Existing RDBMS system to Hadoop.
§ Developed data pipeline using Kafka, Sqoop, Hive and Java map reduce to ingest customer behavioral data and financial histories into HDFS for analysis.
§ Experienced in Pig to do transformations, event joins, filter bot traffic and some pre-aggregations before storing the data onto HDFS.
§ Used Hive to analyze the partitioned and bucketed data and compute various metrics for reporting.
§ Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
§ Involved in developing Hive UDFs for the needed functionality that is not out of the box available from Apache Hive.
§ Implemented Kafka Storm topologies, which are capable of handling and channelizing high stream of data and integrating the storm topologies to filter and process that data across multiple clusters for complex event processing
§ Along with the Infrastructure team, involved in design and developed Kafka and Storm based
Data pipeline.
§ Developed SQOOP scripts for importing and exporting data into HDFS and Hive.
§ Responsible for processing ingested raw data using Kafka and Hive.
§ Developing Pig Scripts for change data capture and delta record processing between newly arrived data and already existing data in HDFS.
§ Involved in pivot the HDFS data from Rows to Columns and Columns to Rows.
§ Involved in emitting processed data from Hadoop to relational databases or external file systems using SQOOP, HDFS GET or Copy to Local.
§ Involved in developing Shell scripts to orchestrate execution of all other scripts (Pig, Hive, and MapReduce) and move the data files within and outside of HDFS.
§ Had a couple of workshops on Spark, RDD & spark-streaming.
§ Discussed the implementation level of concurring programing in spark using python with message passing.
Environment: Hadoop, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Flume, spark, Hortonworks
Hills physicians–San Ramon, CA February 2014 – September 2015
Role: Hadoop Developer
Description: Hill Physicians is health care insurance company which serve some of the most advanced and comprehensive healthcare in northern California – through a network of primary care physicians.
Responsibilities:
§ Developed in writing MapReduce jobs.
§ Ingested data into HDFS using SQOOP and scheduled an incremental load to HDFS.
§ Developed scripts in Pig for transforming data and extensively used event joins, filtered and did pre-
aggregations.
§ Developed Pig UDFs for the needed functionality that is not out of the box available from Apache Pig.
§ Analyzed data using Hive the partitioned and bucketed data and compute various metrics for reporting.
§ Good experience in developing Hive DDLs to create, alter and drop Hive TABLES.
§ Developed Hive UDFs.
§ Implemented Kafka for streaming data and filtered, processed the data.
§ Developed data pipeline using flume, Sqoop and pig to extract the data from weblogs and store in HDFS
§ Developed Shell scripts for scheduling and automating the job flow.
§ Developed a workflow using Oozie to automate the tasks of loading the data into HDFS.
§ Developed MapReduce jobs to calculate the total usage of data by commercial routers in different
locations, developed Map reduce programs for data sorting in HDFS
§ Monitor the cluster – jobs, performance and fine-tune when necessary using tools Cloudera Manager,
Ambari.
§ Load balancing of ETL processes, database performance tuning ETL processing tools.
§ Loaded the data from Teradata to HDFS using Teradata Hadoop connectors.
§ Optimized Hive queries to extract the customer information from HDFS or HBase.
§ Developed Pig Latin scripts to aggregate the log files using Kibana of the business clients.
§ Performed Data scrubbing and processing with Oozie and for workflow automation and coordination.
Environment: Hadoop, MapReduce, Yarn, Hive, Pig, HBase, Oozie, Sqoop, Strom, Flume, Core Java Cloudera HDFS, Eclipse.
Client – CHASE BANK -Hyderabad, India January 2011– December 2013
Role: Java/SQL Developer
Responsibilities:
§ Interacted with Team and Analysis, Design and Develop database using ER Diagram, Normalization and relational database concept.
§ Involved in Design, Development and testing of the system.
§ Developed SQL Server Stored Procedures, Tuned SQL Queries (using Indexes and Execution Plan).
§ Developed User Defined Functions and created Views.
§ Created Triggers to maintain the Referential Integrity.
§ Implemented Exceptional Handling.
§ Worked on client requirement and wrote Complex SQL Queries to generate Crystal Reports.
§ Creating and automating the regular Jobs.
§ Tuned and Optimized SQL Queries using Execution Plan and Profiler.
§ Rebuilding Indexes and Tables as part of Performance Tuning Exercise.
§ Involved in performing database Backup and Recovery.
§ Worked on Documentation using MS word.
Environment: SQL Server 7.0/2000, SQL, T-SQL, BCP, Visual Basic 6.0/5.0, Crystal Reports 7/4.5, Java,
J2ee, JDBC, EJB, JSP, EL, JSTL, JUNIT, XML.