
Basavaiah Hadoop
- Hadoop Developer
- Atlanta, GA
- Member Since Jun 14, 2023
Basavaiah
Professional Summary:
Hadoop Developer having 8+ years of professional IT experience which includes 3+ years of Big Data experience in Ingesting, transforming data and loading to BI\analytical systems, Data Modeling, Querying, Processing, Analysis and implementing enterprise level systems spanning Big Data and Data Integration.
· Experienced in using various Hadoop Ecosystems such as HDFS, Mapreduce, Hive, Impala, Zookeeper, Oozie, Hbase, Sqoop, Oozie, Pig, Kafka, Sparkand Flume for data storage and analysis.
· Experienced in Hadoop architecture and the daemons of Hadoop - Name Node, Data Node, Resource Manager, Application Master, Node Manager, Task Tracker, Job Tracker,Single node and Multi node Cluster Configurations.
· Experienced on Spark and performed various actions and transformations on larger data sets.
· Hands on experience in executing Batch jobs of the data streams through Spark Streaming using RDDs and Dataframes.
· Hands on experience on Partitions, Bucketing in Hive and designed both Managed and External tables in Hive to optimize performance.
· Proficient in using Apache Sqoop to import and export data between HDFS\Hive and RDBMS systems.
· Experience in HBase tables to load large sets of structured, semi-structured and unstructured data coming from UNIX, NoSQL and a variety of portfolios.
· Hands on experience on NoSQL databases such as HBase, Cassandra and MongoDB.
· Expertise in relational databases like Oracle, MySQL and SQL Server.
· Experienced in using Flume and Kafka to load the log data from multiple sources into HDFS.
· Experienced in job workflow scheduling and monitoring tools like Oozie and Zookeeper
· Expertise in creating/managing database objects like Tables, Views, Indexes, Procedures, Triggers, Functions.
· Extensive knowledge in creating PL/SQL stored Procedures, packages, functions, cursors etc against Oracle (9i, 10g, 11g), and MySQL server.
· Experienced in analyzing data using HiveQL, PigLatin and custom Mapreduce programs in Java.
· Expert in creating Pig and HiveUDFs using Java in order to analyze the data efficiently.
· Imported data using Sqoop to load data from MySQL to S3 Buckets on regular basis.
· Experience in developing pipelines and processing data from various sources and processing them with Hive and Pig
· Experienced in writing Storm topology to accept the events from Kafka producer and emit into Cassandra DB.
· Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
· Experience in UNIX Shell scripting.
· Performed ETL on different formats of data like JSON, CSV files and converted them to Parquet\ORC while loading to final tables.
· Knowledge on reporting tools like Tableau for analytics on data in cloud.
· Experienced in source control repositories via CVS,SVN,GitHub.
· Experienced in Database development, ETL, OLAP, OLTP.
· Having good knowledge in various data interchange and representation formats such as JSON, XML, AVRO, ORC,Paraquet.
· Strong Experience in handling Web Servers like Tomcat and application servers like Web Logic, WebSphere and JBOSS.
· Extensive experience in Java and J2EE technologies like Servlets, JSP,JDBC, XML and HTML.
· Worked with Big Data distributions like Cloudera (CDH3 and CDH4) with Cloudera Manager, HortonworksAmbariand MapR.
· Experience with both Waterfall, Agile Software development methodologies.
· Ability to adapt to evolving technology, strong sense of responsibility and accomplishment.
· Willing to update my knowledge and learn new skills according to business requirements.
Technical Skills:
| Hadoop/Big Data Technologies | HDFS, Mapreduce, Spark, Hive, Hue, Impala, Pig, Sqoop, Flume, Kafka, Oozie, Zookeeper, Ambari, Storm. | 
| Hadoop Distribution Systems | Cloudera, Horton works, MapR. | 
| Programming Languages | C, C++, Java, Python, Scala, Bash/Shell Scripting, PL/SQL. | 
| Frame Works | Spring, Hibernate, Struts, JSF, JMS, EJB. | 
| Web Technologies | HTML, XML, CSS, JavaScript, JSON, AJAX, Jquery, Bootstrap. | 
| Databases | Oracle, MySQL, DB2, Teradata, SQL Server. | 
| Operating Systems | UNIX(OSX, Solaris), Windows , Linux(Cent OS, Fedora, Red Hat) | 
| Build And IDE Tools | Maven, Eclipse, Net Beans, Toad, Informatica, Tableau, Congno’s, Talend. | 
| No SQL Databases | Hbase, Cassandra, MongoDB. | 
| Web/Application Server | Apache Tomcat, Web Sphere, Web Logic, JBoss. | 
| Methodologies | Agile, Waterfall. | 
| Version Control | SVN, CVS, GIT. | 
Professional Experience:
Hadoop Developer 2016 Apr -Till Date
T-Mobile,Atlanta,GA
Responsibilities:
· Installed/Configured/Maintained Apache Hadoop clusters for application development and Hadoop tools like Hive, Pig, Hbase, Zookeeper and Sqoop.
· Used Sqoop to import data into HDFS and Hive from multiple data systems.
· Involved in the process of Cassandra data modelling and building efficient data structures.
· Assisted the project manager in problem solving with Big Data technologies for integration of Hive with HBase and Sqoop with HBase.
· Involved in developing Shell scripts to orchestrate execution of all other scripts and move the data files within and outside of HDFS.
· Configured, Designed implemented and monitored Kafka cluster and connectors.
· Implemented a proof of concept (Poc's) using Kafka, Strom, Hbase for processing streaming data.
· Involved in converting Cassandra/Hive/SQL queries into Spark transformations using SparkRDD's.
· Used Spark-Streaming APIs to perform necessary transformations and actions on the data got from Kafka and Persists into Cassandra database.
· Developed multiple Pocs using Spark and deployed on the Yarn cluster, compared the performance of Spark, with Cassandra and SQL.
· Created POC to store Server Log data into Cassandra to identify System Alert Metrics
· Load the data into Spark RDD and performed in-memory data computation to generate the output response.
· Migrated Complex map reduce programs into in memory spark processing using Transformations and actions.
· Created Partitioned and Bucketed Hive tables in Parquet File Formats with Snappy compression and then loaded data into Parquet hive tables from Avro hive tables.
· Designed and developed Pig scripts with Java UDF's to implement business logic to transform the ingested data.
· Implemented custom interceptors for flume to filter data and defined channel selectors to multiplex the data into different sinks
· Recovering from node failures and troubleshooting common Hadoop cluster issues.
· Scripting Hadoop package installation and configuration to support fully automated deployments.
· Developed a data pipeline using Kafka, Spark and Hive to ingest, transform and analyzing data.
· Worked on Talend Open Studio and Talend Integration Suite. Developed and designed ETL Jobs using Talend Integration Suite in Talend 5.2.2.
· Exported the aggregated data onto Oracle using Sqoop for reporting on the Tableau dashboard.
· Involved inQA support activities, test data creation and Unit testing activities.
· Performance tuning using Partitioning, bucketing of IMPALA tables
· Involvement in design, development and testing phases of Software Development Life Cycle.
· Performed Hadoop installation, updates, patches and version upgrades when required.
· Utilized Agile Scrum Methodology to help manage and organize team with regular code review sessions.
· Weekly meetings with technical collaborators and active participation in code review sessions with senior and junior developers.
Environment: Hadoop, HDFS, Hive, Pig, Oozie, Java,Cloudera, Cassandra, Oracle 10g, 11g, Flume, Kafka, Flume, Impala, Scala, Spark, Sqoop.
Hadoop Developer
Anthem Inc,Atlanta,GA 2015 Sep-2016 Mar
Responsibilities:
· Primary responsibilities include building scalable distributed data solutions using Hadoop ecosystem.
· Load and transform large sets of structured, semi structured and unstructured data using Hadoop/Big Data concepts.
· Datasets will be loaded from two different sources like Oracle, MySQL to HDFS and Hive respectively on daily basis.
· Installed and configured Hive on the Hadoop cluster.
· Worked on Hbase Java API to populate operational Hbase table with Key value.
· Developed multiple Mapreduce jobs in java for data cleaning and preprocessing.
· Developing and running MapReduce jobs on YARN and Hadoop clusters to produce daily and monthly reports as per user's need.
· Scheduling and managing jobs on a Hadoop cluster using Oozie work flow.
· Experience in developing multiple Mapreduce programs in java for data extraction, transformation and aggregation from multiple file formats including XML, JSON, CSV and other file formats.
· Imported data using Sqoop to load data from MySQL to HDFS on regular basis.
· Integrated Apache Storm with Kafka to perform web analytics. Uploaded click stream data from Kafka to Hdfs, Hbase and Hive by integrating with Storm.
· Implemented fine tuning mechanisms like indexing, partitioning, bucketing to tune the Teradata/hive database which helped business users to fetch reports more efficiently
· Worked with NoSQL databases like MongoDB in making MongoDB tables to load expansive arrangements of semi structured data.
· Developed the Pig UDF'S to pre-process the data for analysis.
· Designed and developed PIG Latin Scripts to process data in a batch to perform trend analysis.
· Developed HIVE scripts for analyst requirements for analysis.
· Developed complex Mapreduce streaming jobs using Java language that are implemented Using Hive and Pig.
· Used Data Wrangling tool called Trificta which helped in cleaning the data, performed transformations using Hive, MapReduce, loaded data into HDFS and Extracted the data from Postgres into HDFS using Sqoop & Validate Infra readiness on H2O and Ambari Components
· Done various compressions and file formats like Parquet, Snappy, Avro, Text.
· Collected the logs data from web servers and integrated in to HDFS using Flume.
· Handled importing of data from various data sources, performed transformations using Hive, Mapreduce, loaded data into HDFS and Extracted the data from MySQL into HDFS using Sqoop
· Automating and scheduling the Sqoop jobs in a timely manner using Unix Shell Scripts.
· Analyzed the data by performing Hive queries (HiveQL) and running Pig Latin scripts to study customer behavior.
· Developed Data Cleansing techniques / UDFs using Pig scripts / Hive QL, Map/Reduce.
· Continuously monitored and managed the Hadoop Cluster using Cloudera Manager.
· Involved in design and developed Kafka and Stormbased data with the infrastructure team.
Environment: HDFS, Pig, Pig Latin, Storm, Kafka, Eclipse, Hive, Mapreduce, Java, Avro, Sqoop, LINUX, Cloudera, Big Data,MongoDB, JSON, XMLand CSV.
Hadoop Developer 2013 Mar-2015 Aug
Bank Of America,Charolotte,NC
Responsibilities:
· Involved in analysing requirements and establish development capabilities to support future opportunities.
· Involved in sharing data to teams which analyse and prepare reports on Risk management.
· Handled importing of data from various data sources, performed transformations using PIG, Mapreduce, loaded data into HDFS and extracted data from MySQL into HDFS using SQOOP.
· Worked on streaming the analyzed data to the existing relational databases using SQOOP by making it available for visualization and report generation to the BI team.
· Involved in loading and transforming large sets of Structured, Semi-Structured and Unstructured data and analyzed them by running Hive queries and Pig scripts.
· Developed MapReduce programs and Hive queries to analyze sales pattern and customer satisfaction index over the data present in various relational database tables.
· Developed Hive queries to process the data and generate the data cubes for visualizing.
· Did various performance optimizations like using distributed cache for small datasets, Partition, Bucketing in hive and Map Side joins.
· Used Flume to Channel data from different sources to HDFS. Created HBase tables to store variable data formats of PII data coming from different portfolios.
· Exported the analyzed data to the relational databases using Sqoop for visualization and to generate reports for the BI team.
· Involved in End to End implementation of ETL logic.
· Developed Pig scripts to parse the raw data, populate staging tables and store the refined data in partitioned DB2 tables for Business analysis.
· Worked on managing and reviewing Hadoop log files. Tested and reported defects in an Agile Methodology perspective.
· Effective coordination with offshore team and managed project deliverable on time.
· Worked on QA support activities, test data creation and Unit testing activities.
· Developed Oozie workflows and they are scheduled through a scheduler on a monthly basis.
· Created Pig Latin scripts to sort, group, join and filter the enterprise wise data.
· Preparation of BRD (Business Requirement Document) and getting the approval from Business users.
· System Analysis for providing effective solution.
· Preparation of Technical Design Document (TDD).
· Responsible for cluster maintenance, adding and removing cluster nodes, cluster monitoring and troubleshooting, manage and review data backups, manage and review Hadoop log files.
Environment:Hadoop, HBase, Map Reduce, Hive, Pig, Sqoop, Hbase, SQL, Oozie, Linux, UNIX.
Java/J2EE Developer 2011 Dec-2013 Feb
Sonata Software Ltd-INDIA
Responsibilities:
· Involved in SCRUM sprint planning and daily standup meetings throughout the process of development.
· Used OO techniques such as UML methodology (use cases, sequence diagrams and activity diagrams) and developed class diagrams that depict the code's design and its compliance with the functional requirements.
· Developed the web tier using Spring MVC framework. Used spring for dependency injection and integrated spring with Hibernate ORM framework.
· Used HTML, jquery, JSP, JSF and AJAX in the presentation tier. Developed business delegates and Service Locators to communicate with the Enterprise Java Beans (EJB).
· Developed and consumed REST web services using Jersey framework.
· Used J2EE design patterns like Session Facade, Business Delegate, Service Locator, Command delegate extensively.
· Used Hibernate ORM (Object Relational Mapping) Framework to interact with the database to overcome manual result set handling. Developed hibernate configuration files, mapping files and mapping classes.
· Performed Data Transformations using XSLT and developed SOAP web services using Apache CXF.
· Used Maven to build and deploy the application onto WebSphere Application Server.
· Was also involved in migration of the application from WebSphere to JBOSS application server.
· Used Eclipse IDE for development and SVN for Version Control.
Environment:JDK , Web Sphere, JBOSS, Spring, Hibernate ORM, HTML, XML, JSF , JSP, AJAX, JDBC, XSLT JavaScript, SOAP, REST, SoapUI, JMS, SVN, JUnit, EasyMock, Jquery, Maven, Jenkins.
Java Developer
Infinite Computer Solutions –INDIA 2010 Oct- 2011 Nov
Responsibilities:
· Designed the application using the J2EE design patterns such as Session Façade, Business Delegate, Service Locator, Value Object and Singleton.
· Developed presentation tier as HTML, JSPs using Struts 1.1 Framework. Used AJAX for faster page rendering.
· Developed the middle tier using EJBsStateless Session Bean, Java Servlets.
· Entity Beans used for accessing data from the Oracle 9i database.
· Worked on Hibernate for data persistence.
· Worked on application deployment on various tomcat server instances using putty.
· Worked in TOAD for PL/SQL in Oracle database for writing queries, functions, stored procedures and triggers.
· Worked on JSP, Servlets, HTML, CSS, JavaScript, JSON, Jquery, AJAX for Vault web based project and EDFP application.
· Prepared high and low level design documents for the business modules for future references and updates.
· Deployed the application in JBoss Application Server in developm