
Ronak Jain
- Lead Spark / Scala Developer / Lead Hadoop Developer
- Plano, TX
- Member Since Jun 01, 2023
Ronak Jain
Lead Hadoop Developer
SUMMARY:
• Over 11+ years of experience in Information Technology which includes 5+ years’ experience in Big data and HADOOP Ecosystem.
• Excellent Hands on Experience in developing Hadoop Architecture in Windows and Linux platforms.
• Good understanding/knowledge of Hadoop Architecture.
• Experience on Hadoop distributions like Cloud era and Horton Works.
• Hands-on experience on major components in Hadoop Ecosystem including HDFS, MapReduce, Hive, Pig, Spark, HBase, Sqoop and knowledge of Flume, Talend.
• Set up standards and processes for Hadoop based application design and implementation.
• Experienced in developing Map Reduce programs in JAVA using Apache Hadoop for working with Big Data.
• Good experience in optimizing Map Reduce algorithms using Mappers, Reducers, combiners and practitioners to deliver the best results for the large datasets.
• Worked on Spark SQL and Data frames for faster execution of Hive queries using Spark Sql Context.
• Experience in developing programs in Spark using Python to compare the performance of Spark with Hive and SQL/Oracle.
• Set up standards and processes for Hadoop based application design and implementation.
• Extensive knowledge on Spark-Sql Development.
• Experience in analyzing data using HIVEQL, PIG Latin. Extending HIVE and PIG core functionality by using custom UDF's.
• Excellent knowledge of different RDBMS like Teradata, Oracle 11g and SQL Server.
• Teradata performance tuning, identifying and resolving performance bottlenecks. Experience in SQL Performance Tuning, table structure and index design for better query performance
• Hands on experience in writing Map Reduce jobs on Hadoop Ecosystem using Pig Latin and creating Pig scripts to carry out essential data operations and tasks.
• Experience in Designing, developing and implementing connectivity products that allow efficient exchange of data between our core database engine and Hadoop ecosystem.
• Experience in importing and exporting the data using Sqoop from HDFS to Relational Database systems/mainframe and vice-versa.
• Experience ingesting both structured and unstructured Data using Flume into HDFS from Legacy systems including Streaming Data.
• Good Expertise working with different types of data including semi/un-structured data.
• Worked on NoSQL databases including HBase, MongoDB.
• Experience in processing different file formats like XML, JSON and sequence file formats.
• Good Experience working with machine learning workflows and deviseda machine learning algorithms using Python.
• Good Knowledge in Amazon AWS concepts like EMR and EC2 web services which provides fast and efficient processing of Big Data.
• Good Experience in creating Business Intelligence solutions and designing ETL workflows using Tableau.
• Good Experience in Agile Engineering practices, Scrum methodologies, and Test Driven Development and Waterfall methodologies.
• Good knowledge on Apache Kafka and in configuring producers and consumers in it.
• Hands-on Experience in Object Oriented Analysis, Design (OOAD) and development of software using UML Methodology.
• Exposure to Java development projects.
• using Spark-TS library for analyzing large-scale time series data sets.
• Hands on experience in database design using PL/SQL to write Stored Procedures, Functions, Triggers and strong experience in writing complex queries, using Oracle, DB2 and MySQL.
• Good working experience on different OS like UNIX/Linux, Apple Mac OS-X Windows.
• Experience working both independently and collaboratively to solve problems and deliver high quality results in a fast-paced, unstructured environment.
Education: Bachelor’s in Computer Science
TECHNOLOGY AND TOOLS:
• Big Data Technologies |
• Hadoop, HDFS, Hive, Map Reduce, Pig, Sqoop, Flume, Oozie, kafka ,Spark and HBase |
• Programming Languages |
• Java (5, 6, 7), Python, Scala |
• Data bases/RDBMS |
• MySQL, SQL/PL-SQL, MS-SQL Server 2005, Oracle 9i/10g/11g,Teradata |
• Scripting/ Web Languages |
• JavaScript, HTML5, CSS3, XML, SQL, Shell |
• ETL Tools |
• Cassandra, HBASE,ELASTIC SEARCH |
• Operating Sistemas |
• Linux, Windows XP/7/8 |
• Software Life Cycles |
• SDLC, Waterfall and Agile models |
• Office Tools |
• MS-Office, MS-Project and Risk Analysis tools, Visio |
• Utilities/Tools |
• Eclipse, Tomcat, NetBeans, JUnit, SQL, SVN, Log4j, SOAP UI, ANT, Maven, Automation and MR-Unit |
• Cloud Platforms |
• Amazon EC2 |
PROFESSIONAL EXPERIENCE:
Intuit Inc.Plano, TX, Oct ‘16– Present
Role: Lead Spark/Scala Developer
Responsibilities:
• Participated in the sprint review meetings and explained the technical changes to the clients.
• Usage of Spark Streaming and Spark SQL API to process the files.
• Developed Spark scripts by using Scala shell commands as per the requirement.
• Processing the schema oriented and non-schema oriented data using Scala and Spark.
• Developed and designed system to collect data from multiple portal using Kafka and then process it using spark.
• Developed and designed automate process using shell scripting for data movement and purging.
• Used Spark API over Cloud era Hadoop YARN to perform analytics on data in Hive.
• Developed Scala scripts, UDFFs using both Data frames/SQL/Data sets and RDD/MapReduce in Spark 1.6 for Data Aggregation, queries and writing data back into OLTP system through Sqoop.
• Worked on Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
• Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, managing and reviewing data backups and Hadoop log files.
• Involved in the process of data acquisition, data pre-processing and data exploration of Telecommunication project.
• In pre-processing phase used spark to remove all the missing data and data
• Used flume, sqoop, Hadoop, spark and Oozie for building data pipeline.
• Installed and configured Hadoop Map Reduce, HDFS, Developed multiple MapReduce
• Used spark and spark-sql to read the parquet data and create the tables in hive using the Python API
• Implemented AWS provides a variety of computing and networking services to meet the needs of applications.
• Involved in understanding the existing application and transformations built using Abinitio and Teradata
• Extensively involved in developing Restful API using JSON library of Play framework.
• Used Scala collection framework to store and process the complex consumer information.
• Used Scala functional programming concepts to develop business logic.
• Developed automated workflows for monitoring the landing zone for the files and ingestion into HDFS in Bedrock Tool and Talend.
• Implemented Apache NiFi processors for end to end process of ETL- Extraction, Transformation and Loading data files.
• Improved Apache NiFi cluster performance by distributing flow of data to multiple nodes with RPG
• Developed optimal strategies for distributing the ITCM log data over the cluster; importing and exporting the stored log data into HDFS and Hive using Apache Nifi
• Imported data into HDFS from various SQL databases and files using Sqoop and from streaming systems using Storm into Big Data Lake.
• Collected and aggregated large amounts of log data using Apache Flume and staged data in HDFS for further analysis.
• Importing and exporting data into HDFS and Hive using Sqoop.
• Used Sqoop tool to load data from RDBMS into HDFS.
• Worked on real time streaming data received by Kafka and processed the data using Spark and this data was further stored into HDFS cluster using python.
• Streaming of real time data using Spark with Kafka.
• Managing and reviewing Hadoop log files.
• Running Hadoop streaming jobs to process terabytes of xml format data.
• Analyzing large-scale time series data sets and using the Spark-TS library.
• Worked on Spark-TS library, whichprovides both Scala and Python APIs for manipulating, and modeling time series data, on top of Spark.
• Supported Map Reduce Programs those are running on the cluster.
• Cluster coordination services through Zookeeper.
• Involved in loading data from UNIX file system to HDFS.
• Installed and configured Hive and written HiveUDFs.
• Integrated Apache Storm with Kafka to perform web analytics and to perform click stream data from Kafka to HDFS.
• Involved in creating Hive tables, loading with data and writing hive queries which will run internally in map way
• Automated all the jobs, for pulling data from FTP server to load data into Hive tables, using Oozie workflows.
Environment: HDFS, Map Reduce, Sqoop, Oozie, Pig,TeradataHive,Nifi, HBase, Flume, LINUX, Java, Eclipse, Cassandra, Hadoop Distribution of Cloudera, PL/SQL, UNIX Shell Scripting, and Eclipse
Comcast. - Philadelphia, PY March15 – Sept 16 Role: Lead Hadoop developer
Responsibilities:
• Involved in various stages of Software Development Life Cycle (SDLC) during application development.
• Involved in installing and configuring various Hadoop components such as Pig, Hive, Sqoop, Flume, Oozie.
• Used Sqoop as data ingestion tool to import and export data from RDBMS to HDFS and Hive.
• Log data collected from the web servers was channeled into HDFS using Flume and spark streaming.
• Data was also processed using spark such as aggregating, calculating the statistical values by using different transformations and actions.
• Large data sets were analyzed using Pig scripts and Hive queries.
• Implemented bucketing concepts in Hive and Managed and External tables were designed to enhance the performance.
• Developed Spark scripts by using Scala shell commands as per the requirement.
• Processing the schema oriented and non-schema oriented data using Scala and Spark.
• Developed and designed system to collect data from multiple portal using kafka and then process it using spark.
• Involved in developing Pig scripts to transform raw data into data that is useful to gain business insights.
• Used Sqoop to export the analyzed data for visualization and generation of reports, which are given to BI team.
• Ingest Flat files received via ECG FTP tool and files received from Sqoop into UHG Data Lake Hive and HBase using Data Fabric functionalities.
• Extensively used SQL in analyzing, testing, prototyping the Data solutions in Teradata
• Worked on Snappy compression for Avro and Parquet files.
• Configured MySQL database to store Hive metadata.
• Oozie workflow engine was installed to run multiple Hive and Pig Jobs.
• Developed Spark code using Scala and Spark-SQL/Streaming for faster testing and processing of data.
• Migration of ETL processes from RDBMS to HIVE to test easy data manipulation.
• Implemented test scripts to support test-driven development and continuous integration.
• Collected and aggregated large amounts of data from different sources such as COSMA (CSX Onboard System Management Agent), BOMR (Back Office Message Router), ITCM (Interoperable train control messaging), Onboard mobile and network devices from the PTC (Positive Train Control) network using Apache Nifi and stored the data into HDFS for analysis.
• Supported QA engineers in understanding, Troubleshooting and Testing.
• Developed Spark jobs using Scala in test environment for faster data processing and used Spark SQL for querying.
• Mentored analyst and test team for writing Hive queries.
• Cluster co-ordination services through ZooKeeper.
Environment: Hadoop, HDFS, Map Reduce, Hive, Pig, Nifi,HBase, Sqoop, Spark, Oozie, Zookeeper, RDBMS/DB, MySQL, CSV.
Verizon, Dallas, Jan14 – Feb15
Role:Sr. Bigdata developer
Responsibilities:
• Worked with BI team in the area of Big Data Hadoop cluster implementation and data integration in developing large-scale system software.
• Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, HBase, Flume, Oozie Zookeeper and Sqoop.
• Responsible for building scalable distributed data solutions using Hadoop.
• Performed performance tuning and troubleshooting of Map Reduce jobs by analysing and reviewing Hadoop log files.
• Developed several custom User defined functions in Hive & Pig using Java.
• Installed and configured Hadoop Map reduce, HDFS, developed multiple Map Reduce jobs in java for data cleaning and pre-processing.
• Collected the logs from the physical machines and the Open Stack controller and integrated into HDFS using Flume.
• Migrated an existing on-premises application to AWS.
• Experience in running Hadoop streaming jobs to process terabytes of xml format data.
• Migrate mongo dB shared/replica cluster form one data centre to another without downtime.
• <span style="font-size: