About Me
I am a data engineer with experience in developing ETL batch pipelines based on Spark, Hadoop, and AirFlow. I started my career in 2010 by responding to malware incidents and have 8 years of experience. I also worked as an evangelist for incident prevention. I contributed to the development of an ETL pipeline for antivirus products that collected data from 25 million active endpoints, which supported AI model training and product application. Additionally, I led the development of an automation project for discovering neglected data and ETL pipelines for the incident response team’s big data platform. Currently, I am responsible for the service development, operation, and maintenance of an ETL pipeline platform that collects and transforms data required for analysis and utilization in gaming services. I also believe that big data technology should be accessible to everyone, so I share methods for building and integrating these technologies on my personal blog. Through defining and solving problems on my own, I have experienced immersion and growth. I pursue work efficiency improvements through automation of repetitive tasks and process optimization. Having started later in my career, I am determined to experience various data and technologies more intensely and aim to grow into a technical leader in the field of data engineering.
Experience
Data Engineering Team (2024.01 ~ Present)
Senior Developer (Data Engineer)
Development, operation, and maintenance of an ETL platform that collects and transforms database data loaded in game services
- Management and operation of the ETL process that collects and transforms game database data
- Development of a data transformation service based on Spring Boot, Vue.js 3, Zeppelin, and Spark
Response Team (Feb 2021 ~ Present)
Project Leader (Data Engineer)
Leading the entire process of work automation and data projects
- Reduce data processing time by 90% by building own Hadoop platform and ETL pipeline
- Automation of public information collection on incident information sharing sites
AI Team (Feb 2018 ~ Jan 2020)
Senior Developer (Data Engineer)
Participating in ETL pipeline development based on Spark, Hadoop, and AirFlow
- 70% reduction in processing time for 15 billion behavioral data per month
- 50% improvement in label and latest status update performance for 7 billion files
- Contributed to use the generated data for AI model learning and load on V3, MDS, and EDR products
ASEC Response Team (Nov 2010 ~ Jan 2018)
Senior Analyst / Evangelist (Incident Analysis/Response)
Analysis of malware incidents, response to major security issues, and activities for incident prevention
- Selected as a premium representative employee (Maeil Business Newspaper - Dec 2014)
- System analysis of malware-based cyber attacks
- Publishing reports for incident prevention (over 30 times) and presenting trends (over 40 times)
- V3 product-based malware response and ticket processing (24/7)
Projects
Development of a notebook-based interactive service for data transformation and analysis, and integration with the ETL platform
Project Leader (1 in total) / Responsible for planning, design, implementation, development, and operations.
2024.05 - Present / Data Engineering Team
New development of a batch-based service based on Livy into an interactive service based on Zeppelin
- H/A configuration of the Zeppelin service, building a token-based authentication system using Nginx
- Management of Zeppelin notebooks, design and development of a Vue.js 3-based UI integrated with the ETL platform backend
- Development of a Spring Boot-based backend service for integrating and managing Zeppelin, UI, and the ETL platform
Used Skills Spring Boot, Vue.js 3, Zeppelin, Spark, Java, Javascript, Scala
Leading the development of ransomware incident statistics site crawling
Project Leaders (2 in total) / Responsible for planning, design, implementation, code review, and mentoring
Mar 2023 - Oct 2023 (7 months) / Response Team
100% automation to generate statistics and mailing for ransomware incident status of manual-based
- Documenting architecture design, implementation, integration, and automation methods
- Leading the development of a CLI-based client tool for collecting statistical information
- Leading the development of visualization of weekly and monthly statistics and the regular mailing functionality
Used Skills AirFlow, PostgreSQL, Superset, Python, Selenium, BeautifulSoup
Leading the construction of a big data processing platform for the incident response department
Project Leaders and Data Engineers (3 in total) / Responsible for planning, design, implementation, development, and operations
Jan 2023 - Sep 2023 (9 months) / Response Team
Building a big data platform capable of processing large volumes of antivirus reports and EDR event logs that rely on manual work
- Responsible for platform construction, CLI-based client tool, and pipeline automation development
- Reduced data processing time of existing reports by 90% (based on PoC)
- Documented architecture design, implementation, integration, and automation methods to share learning and growth experiences
- Enabled non-development departments to secure their own automation capabilities and big data processing environment
Used Skills Hadoop, Spark, Ambari, AirFlow, Livy, Bigtop, MinIO, Docker, VirtualBox, Linux, Scala, Python
Establishing a Basic Framework for Project Management and Progress (Making High Performers)
Project Leaders (2 in total) / Responsible for planning and operations
Aug 2022 - Sep 2022 (2 months) / Response Team
Methodology development and guide document creation to enhance project execution capabilities for the response team - Organized step-by-step processes for service and development tasks and created 10 custom templates - Used as team training materials for project management Used Skills Jira, Confluence, Waterfall, SW Engineering
Building an Automatic Malicious Indicator Collection Platform (vx-underground)
Project Leader and Data Engineer (1 in total) / Responsible for planning, design, implementation, development, and operations
May 2022 - Nov 2022 (8 months) / Response Team
Built and documented an automatic data collection system for external data using Python and AirFlow - Automatically collected malware and report files published on vx-underground - Developed crawlers and downloaders, and created an automated execution environment using AirFlow - Applied the same architecture to crawl RansomDB statistical information Used Skills AirFlow, PostgreSQL, MinIO, Docker, Python
Building the Basic Research and Development Environment for the Response Team (Python, Docker, Scala)
Project Leader and Data Engineer (1 in total) / Responsible for planning, design, implementation, development, and operations
Oct 2021 - Jan 2022, Jul 2023 - Aug 2023 (Total 6 months) / Response Team
Provided standard development environment for non-development departments’ development projects
- Built configuration management, build, and deployment environment based on software engineering principles
- Set up Docker-based GitLab, Jenkins, Nexus, and configured Docker, Python, and Scala build agents
- Minimized trial and error by documenting review procedures and usage instructions
- Applied version control to the code and deliverables used in the work
Used Skills GitLab, Jenkins, Nexus, Docker, VirtualBox, Python, VSCode, Scala, sbt, IntelliJ, SW Engineering
Development of AI-based Phishing URL and Email Detection Technology
Data Engineer (Total 8) / Responsible for developing eml file labeling model
May 2020 - Jan 2021 (9 months) / AI Team
Detailed analysis of original phishing emails (eml) over 6,000 for misclassification analysis
- Designed phishing email eml data and built pipelines
- Conducted labeling model experiments and misclassification analysis using Snorkel and PySpark
- The phishing email detection model was integrated into the threat pre-detection product’s MTA module
Used Skills Spark, PySpark, AirFlow, MinIO, PostgreSQL, Scala, Python, Jupyter, Snorkel, Pandas, Linux, PyCharm, VSCode, Malware Analysis, Jira, Confluence
Development of Malware Detection Technology Based on Graph DB (w. Bitnine)
Data Engineer (Total 4) / Responsible for data modeling and graph DB data validation
Oct 2018 - Dec 2018.12 (3 months) / AI Team
Performed data modeling and validation for graph analysis of behavior data collected from user devices
- Graph data modeling and data transformation of behavior data
- Validated query capability of created graph data for intrusion scenario analysis
- Managed Jira and Confluence on a cloud platform for collaboration with external vendors
Used Skills Spark, Hadoop, PostgreSQL, AgensGraph, CipherQL, Zeppelin, Jupyter, Bitbucket, Bamboo, Jira, Confluence, Graph Modeling, Agile, Open Innovation
Establishment of Source Data Acquisition and Management System for AI Technology R&D
Data Engineer (Total 8) / Responsible for Spark Application and AirFlow ETL pipeline development
Mar 2018 - Apr 2019, May 2019 - Apr 2020 (2 years 2 months) / AI Team
Achieved over 60% performance improvement through the development and optimization of Spark and AirFlow-based ETL pipelines
- Optimized RDD implementation and tuning for processing 15 billion behavior data entries monthly, reducing processing time by 70%
- Improved performance by 50% for updating labels through Kafka events analyzing 7 billion files
- Integrated distributed data for MSA features for the first time within the company
- Generated data was used for model training and integrated into V3, MDS, EDR products
- Gained cloud-based data processing automation experience using AWS EMR and S3
Used Skills Spark, Hadoop, AirFlow, Livy, Ambari, AWS EMR, Zeppelin, Jupyter, AWS S3, MinIO, IntelliJ, VSCode, Bitbucket, Bamboo, Artifactory, Scala, sbt, Python, Docker, Linux, Agile, SW Engineering, Test-driven (Unit Test), Architecture, Data Modeling, Jira, Confluence
Skill
These are tools that have been used or are currently being used in actual work.
Platform & Storage
- Spark, PySpark, Hadoop, AWS EMR
- HDFS, MinIO, AWS S3
- AirFlow, Livy, Zeppelin
- Ambari, Bigtop
- PostgreSQL
DevOps & Language
- GitLab, BitBucket
- Jenkins, Bamboo
- Nexus, Artifactory
- Scala (Spark), Python, Java, JavaScript
Tool
- Docker, sbt, VirtualBox, Linux
- Zeppelin, Jupyter Notebook, Jupyter Labs, Superset
- IntelliJ, PyCharm, VSCode
- Snorkel, Pandas, Selenium, BeautifulSoup
- Jira, Confluence
- AgensGraph, CipherQL
- SpringBoot, Vue.js 3
Methodology
- Waterfall, Agile
- SW Engineering, Test-driven (Unit Test)
- Data Modeling, Graph Modeling
Soft Skill
Voluntary planning and application to improve the department's work environment
- Integration of department work processes and creation of manuals
- Planning standard curriculum and producing training materials for new hires’ job competencies
- System establishment, process application, document creation to improve development work
- Building a Hadoop-based Spark platform and utilizing Zeppelin to analyze large-scale incident logs
Leadership and mentoring for job competency improvement and shared growth
- Responsible for job training and mentoring of new employees and new team members
- Leading team studies related to big data, development, and threat intelligence
- Create and share construction methods and usage manuals for each system
Writing technical documentation and sharing know-how
- Evangelist activities to prevent security incidents (about 30 reports, 50 presentations)
- Descriptive research for threat intelligence standards and analysis methodology
- Technology research on open source platform technology for threat intelligence and incident response
- Create and share documents on how to use big data and automation platforms (about 200 technical postings)
Education
Regular
- Mar 2006 – Feb 2011 : M.S. in Computer Science, Yonsei University Graduate School (Visual Communication)
- Mar 2006 - Feb 2009 : Leave of absence due to military service
- Mar 2002 – Feb 2006 : B.A. in Computer Engineering, Yonsei University Wonju Campus
- Mar 1999 – Feb 2002 : Graduated Wonju High Scool
Data Engineering
- Apr 28 2021 - Apr 30 2021 : Building and Utilizing ELK Integrated Log System for IT Professionals (Inflearn)
- Apr 28 2018 - Jul 7 2018 : Deep Learning-Image Recognition 4th (Fast Campus)
- Mar 13 2018 - Mar 20 2018 :Exploratory Data Analysis (Coursera)
Backend Development
- Dec 19 2023 - Jan 8 2024 : Introduction to Java by Kim Young-han (Java First Steps with Code) - Inflearn
- Jan 9 2024 - Jan 16 2024 : Practical Java by Kim Young-han (Basics) - Inflearn
- Jan 17 2024 - Jan 22 2024 : Introduction to Spring (Learning Spring Boot, Web MVC, DB Access Techniques with Code) - Inflearn
- Jan 23 2024 - Feb 15 2024 : Core Principles of Spring (Basics) - Inflearn
- Feb 20 2024 - Mar 7 2024 : Java ORM Standard JPA Programming (Basics) - Inflearn
- Mar 7 2024 - Mar 11 2024 : Basic HTTP Web Knowledge for All Developers - Inflearn
- Mar 12 2024 - Apr 20 2024 : Spring MVC Part 1 - Core Web Development Techniques - Inflearn
- Apr 20 2024 - May 3 2024 : Spring MVC Part 2 - Practical Web Development Techniques - Inflearn
- Jan 2 2025 - Present : Spring DB Part 1 - Core Principles of Data Access - Inflearn
Incident Analysis and Response
- Jun 21 2021 - Jun 25 2021 : From introduction to use of Windows application vulnerability analysis - Inflearn
- Apr 23 2021 - Apr 28 2021 : Malware analysis intermediate course (analysis by type) - Inflearn
- Apr 15 2021 - Apr 23 2021 : Introduction to Windows malware analysis course - Inflearn
- Nov 12 2012 - Nov 16 2012 : (Skill-up) DDoS, reverse engineering, and incident analysis process (KISA)
Communications
곰탱푸닷컴 (https://www.bearpooh.com)
- A personal blog where I share research, building, operation, integration methods, troubleshooting, etc., related to big data and intelligence technologies.
- Average daily page views (PV) of 1,000 ~ 1,300, monthly active users (MAU) of 17,000, monthly PV of 25,000.
- Over 52 million cumulative PV from 200+ technical posts in the last 2 years (as of September 2023).
Presentations
Data Engineering
- N/A
Incident Response
- Sep 24 / Oct 12 / Jul 24 / Sep 18 2023 : Cybersecurity Training Center Incident Trends Lecture (Key Security Threats in 2023 from Past Incidents / Daejeon)
- Jun 13 / Jul 18 / Oct 17 / Nov 28 2022 : Cybersecurity Training Center Incident Trends Lecture (Recent Incident Trends with Case Studies / Daejeon)
- Oct 20 2017 : Korea Defense Industry Association Malware Analysis Training
- Sep 21 2017 : Seoul Women’s University Information Security Course “Introduction to Analysis Tools”
- Jun 13 / Jun 27 2017 : AhnLab ISF SQUARE 2017 Security Trends Presentation (Ongoing Information Security Threats / Seoul, Daejeon)
- Apr 25 : Icheon City Hall Employee Information Security Training (Information Security is Not Difficult - Threat Trends and Countermeasures)
- Apr 22 / Aug 04 / Sep 09 2017 : Seoul Women’s University Information Security Gifted Education Center Security Training (Malware Overview and Safe PC Usage)
- Oct 27 2016 : Samsung Dream Center AhnLab Tour Malware Trend Presentation (Ongoing Ransomware Threats and Countermeasures)
- Jun 21 2016 : AhnLab ISF SQUARE 2016 in Daejeon Ransomware Trend Presentation (Security Trends Seen Through Ransomware)
- Apr 23 / Jul 09 / Sep 24 2016 : Seoul Women’s University Information Security Gifted Education Center Security Training (Malware Overview and Safe PC Usage)
- Jan 11 2016 : Seoul Women’s University Information Security Women’s Workforce Development Camp Special Lecture (Recent Malware Trends and Countermeasures)
- Oct 20 / Oct 27 2015 : AhnLab EP Technical Support Department Malware Analysis Lecture
- Aug 27 2015 : AhnLab ISF SQUARE 2015 in Jeonju Security Trends Presentation (Recent Malware Trends and Countermeasures / Jeonju, HeadIT)
- Feb 03 2015 : Yeongnam University Computer Science Department Visit Event Presentation (Hiring Process and Recent Security Trends)
- Dec 10 2014 : Gyeonggi-do Information and Computer Subject Research Association (Our Stance on Online Banking Incidents)
- Oct 18 / Nov 01 / Nov 15 / Dec 13 2014 : Seoul Women’s University Information Security Gifted Education Center Security Training (Malware Overview and Safe PC Usage)
- Sep 19 2014 : AhnLab Partner Security Training (Malware Trends)
- Sep 11 2014 : Sogang University Graduate School of Information and Communication Security Training (Hacking and Incident Response Overview - Incident Cases, Security Trends)
- Jun 27 2014 : SBS Seoul Broadcasting Employees Information Security Training (Security Trends and Incident Analysis)
- May 22 2014 : Goyang Foreign Language High School Company Tour Security Special Lecture
- Apr 28 2014 : Yonsei University Computer Information and Communication Engineering Career Special Lecture
- Apr 16 2014 : Hanyang University Graduate School Information Security Training (Security Trends and Incident Analysis)
- Apr 16 2014 : Sogang University Graduate School of Information and Communication Security Training (Hacking and Incident Response Overview - Incident Cases, Security Trends)
- Mar 10 / Mar 13 2014 : AhnLab Security Service Department Intern Training (Security Trends and Incident Analysis)
- Mar 05 2014 : Sogang University Graduate School of Information and Communication (Malware and Vulnerability Analysis - Security Threat Elements, Trends)
- Feb 25 2014 : Yonsei University Graduate School Department of Computer Science BK21 Workshop Seminar Special Lecture
- Oct 18 2013 : Science and Engineering High School Company Tour Career Special Lecture (A Day in the Life of a Security Expert)
- Sep 05 2013 : Sogang University Graduate School of Information and Communication Security Training (Hacking and Incident Response Overview - Incident Cases, Security Trends)
- Jun 25 ~ 28 2013 : Korea Securities Depository Information Security Training (Malware Analysis Methods)
- Mar 13 2013 : Sogang University Graduate School of Information and Communication Security Special Lecture (Cyber Security in Everyday Life Through the Drama Ghost)
- Mar 12 2013 : AhnLab Security Service Department Intern Training (Security Trends and Incident Analysis)
- Feb 27 2013 : Ahn Cheol-Soo Foundation Information Security Special Lecture (Safe PC Usage)
- Dec 18 2012 : Science and Engineering High School Company Tour Career Special Lecture (A Day in the Life of a Security Expert)
- Nov 08 2012 : Pyeongchon High School Company Tour Information Security Special Lecture (Cyber Security in Everyday Life Through the Drama Ghost)
- Oct 09 2012 : AhnLab Security Service Department Intern Training (Incident Analysis Using Ahn Reports)
- Jul 26 2012 : Pohang University of Science and Technology Faculty Information Security Special Lecture (Cyber Security in Everyday Life Through the Drama Ghost)
Open Source
- N/A
Interview
- Dec 18 2014 : Maeil Business Newspaper AhnLab Premium Employee Interview
- Sep 13 2013 : AhnLab 2013 Fall Recruitment Promotional Video Interview
- Jan 11 2011 : AhnLab Security World, Interview on Most Impressive Interview Question by New Employees
Article
Data Engineering
- N/A
Incident Response
- Sep 06 2017 : AhnLab Security Letter No. 689, PC Boot Failure Ransomware, Attacker’s Mistake or Warning?
- Jul 13 2017 : AhnLab Security Letter No. 681, A New Unknown Shifr Ransomware… Caution!
- Jul 03 2017 : Monthly 安, Dissecting the Ransomware Storm That Swept the First Half of 2017
- Feb 07 2017 : Monthly 安, Ransomware Targeting Critical Infrastructure, How Has It Changed?
- Feb 06 2017 : Ransomware Response Center, Ransomware Targeting Critical Infrastructure, How Has It Changed?
- Jan 02 2017 : Monthly 安, Bitter and Meticulous: New Security Threats in 2017
- Dec 05 2016 : Monthly 安, Security Threats of 2016, Evolving Amid Survival of the Fittest!
- Nov 03 2016 : AhnLab Security Letter No. 646, Ransomware, Targeting the U.S. Elections?
- Oct 13 2016 : AhnLab Security Letter No. 643, Ransomware Inspired by ‘Mid’?
- Aug 02 2016 : Monthly 安, The Keyword of the First Half of 2016: Ransomware and Targeted Attacks! What’s Next in the Second Half?
- Jul 13 2016 : AhnLab Security Letter No. 631, The Return of the Rocky Ransomware… Be Cautious!
- Jul 07 2016 : AhnLab Security Letter No. 630, Ransomware Also Says “Show Me the Money”?!
- Jul 04 2016 : Monthly 安, Ransomware Also Says “$how me the MONEY!”
- Mar 25 2016 : AhnLab Press Release, Locky Ransomware Variant Victim Prevention ‘User Precautions’ (Original Article)
- Oct 08 2015 : AhnLab Security Letter No. 593, Privacy Exposure in Daily Life, Unnoticed
ETC
Military Service
- Aug 07 2014 : Promoted to a Reserve Captain, ROK Army (Reserve Officer Promotion System)
- Sep 05 2013 : Award for Contribution to the 3879th Unit, 3rd Battalion Mobilization Training (Seongnam Mayor’s Citation) - No. 1623
- Jun 30 2008 : Discharge as a Second Lieutenant in the Republic of Korea Army
- Dec 11 2007 : Award for Contribution to the 2007 National Defense Training (3323rd Unit Commander - Battalion Commander Citation)
- Jun 19 2007 : Award for Contribution to the 2007 Garrison Defense Training (1st Army Headquarters Command - Regiment Commander Citation)
- Mar 01 2006 : Commissioned as a Second Lieutenant in the Republic of Korea Army (ROTC Class 44, Yonsei University 1071 Military Corps)
Architecture
Diagram of the overall system configuration built on the response team starting in 2021