Sangwoo Shim

Data Engineer

About Me

I am a data engineer with 6 years of experience developing ETL batch pipelines. I started my career in malware incident response and have 8 years of experience, and have also worked as an evangelist for incident prevention. Currently, I am leading a project to leverage underserved data by building a big data platform for the response team. I want to experience various data and technologies through immersion and growth.

Experience

AhnLab

Data Engineer
Senior Analyst

Nov 2010 - Present

www.ahnlab.com

Response Team (Feb 2021 ~ Present)

Project Leader (Data Engineer)
Leading the entire process of work automation and data projects

  • Reduce data processing time by 90% by building own Hadoop platform and ETL pipeline
  • Automation of public information collection on incident information sharing sites

AI Team (Feb 2018 ~ Jan 2020)

Senior Developer (Data Engineer)
Participating in ETL pipeline development based on Spark, Hadoop, and AirFlow

  • 70% reduction in processing time for 15 billion behavioral data per month
  • 50% improvement in label and latest status update performance for 7 billion files
  • Contributed to use the generated data for AI model learning and load on V3, MDS, and EDR products

ASEC Response Team (Nov 2010 ~ Jan 2018)

Senior Analyst / Evangelist (Incident Analysis/Response)
Analysis of malware incidents, response to major security issues, and activities for incident prevention

  • Selected as a premium representative employee (Maeil Business Newspaper - Dec 2014)
  • System analysis of malware-based cyber attacks
  • Publishing reports for incident prevention (over 30 times) and presenting trends (over 40 times)
  • V3 product-based malware response and ticket processing (24/7)

Projects

Leading the development of ransomware incident statistics site crawling

Project Leader

Mar 2023 - Oct 2023 (7 months) / Response Team

100% automation to generate statistics and mailing for ransomware incident status of manual-based

Used Skills
AirFlow, PostgreSQL, Superset, Python, Selenium, BeautifulSoup

Leading the construction of a big data processing platform for the incident response department

Project Leader and Data Engineer

Jan 2023 - Sep 2023 (9 months) / Response Team

Reduce processing time by 90% through building a Hadoop platform, configuring an AirFlow pipeline, and developing automation

Used Skills
Hadoop, Spark, Ambari, AirFlow, Livy, Bigtop, MinIO, Docker, VirtualBox, Linux, Scala, Python

hadoop

Leading the development of automatic collection of malware and report disclosure sites

Project Leader and Data Engineer

May 2022 - Dec 2022 (8 months) / Response Team

100% automation of analysis reports and malware collection of major attack groups released by security vendors

Used Skills
AirFlow, PostgreSQL, MinIO, Docker, Python

Leading the construction of SW development environments for incubation of automation development

Project Leader and Data Engineer

Oct 2021 - Jan 2022, July 2023 - Aug 2023 (6 months) / Response Team

Apply history management and integrated output management by configuring a standard environment and establishing processes

Used Skills
GitLab, Jenkins, Nexus, Docker, VirtualBox, Python, VSCode, Scala, sbt, IntelliJ, SW Engineering

Participating in the development of AI-based phishing URL and email detection technology

Data Engineer

May 2020 - Jan 2021 (9 months) / AI Team

Data conversion, ETL, label creation, and direct analysis of over 6,000 original email files

Used Skills
Spark, PySpark, AirFlow, MinIO, PostgreSQL, Scala, Python, Jupyter, Snorkel, Pandas, Linux, PyCharm, VSCode, Malware Analysis, Jira, Confluence

Participating in building and developing a data ETL pipeline for AI model development

Data Engineer

May 2020 - Jan 2021 (9 months) / AI Team

70% reduction in processing time for 15 billion behavioral data for month
50% improvement in integrated performance of analysis event data for 7 billion files

Used Skiils
Spark, Hadoop, AirFlow, Livy, Ambari, AWS EMR, Zeppelin, Jupyter, AWS S3, MinIO, IntelliJ, VSCode, Bitbucket, Bamboo, Artifactory, Scala, sbt, Python, Docker, Linux, Agile, SW Engineering, Test-driven (Unit Test), Architecture, Data Modeling, Jira, Confluence

Skill

These are tools that have been used or are currently being used in actual work.

Platform & Storage

  • Spark, PySpark, Hadoop, AWS EMR
  • HDFS, MinIO, AWS S3
  • AirFlow, Livy
  • Ambari, Bigtop
  • PostgreSQL

DevOps & Language

  • GitLab, BitBucket
  • Jenkins, Bamboo
  • Nexus, Artifactory
  • Scala (Spark), Python

Tool

  • Docker, sbt, VirtualBox, Linux
  • Zeppelin, Jupyter Notebook, Jupyter Labs, Superset
  • IntelliJ, PyCharm, VSCode
  • Snorkel, Pandas, Selenium, BeautifulSoup
  • Jira, Confluence
  • AgensGraph, CipherQL

Methodology

  • Waterfall, Agile
  • SW Engineering, Test-driven (Unit Test)
  • Data Modeling, Graph Modeling

Soft Skill

Voluntary planning and application to improve the department's work environment

  • Integration of department work processes and creation of manuals
  • Planning standard curriculum and producing training materials for new hires’ job competencies
  • System establishment, process application, document creation to improve development work
  • Building a Hadoop-based Spark platform and utilizing Zeppelin to analyze large-scale incident logs

Leadership and mentoring for job competency improvement and shared growth

  • Responsible for job training and mentoring of new employees and new team members
  • Leading team studies related to big data, development, and threat intelligence
  • Create and share construction methods and usage manuals for each system

Writing technical documentation and sharing know-how

  • Evangelist activities to prevent security incidents (about 30 reports, 50 presentations)
  • Descriptive research for threat intelligence standards and analysis methodology
  • Technology research on open source platform technology for threat intelligence and incident response
  • Create and share documents on how to use big data and automation platforms (about 200 technical postings)

Education

Regular

  • Mar 2006 – Feb 2011 : M.S. in Computer Science, Yonsei University Graduate School (Visual Communication)
    • Mar 2006 - Feb 2009 : Leave of absence due to military service
  • Mar 2002 – Feb 2006 : B.A. in Computer Engineering, Yonsei University Wonju Campus
  • Mar 1999 – Feb 2002 : Graduated Wonju High Scool

Data Engineering

  • Apr 28 2021 - Apr 30 2021 : Building and Utilizing ELK Integrated Log System for IT Professionals (Inflearn)
  • Apr 28 2018 - Jul 7 2018 : Deep Learning-Image Recognition 4th (Fast Campus)
  • Mar 13 2018 - Mar 20 2018 :Exploratory Data Analysis (Coursera)

Incident Analysis and Response

  • Jun 21 2021 - Jun 25 2021 : From introduction to use of Windows application vulnerability analysis - Inflearn
  • Apr 23 2021 - Apr 28 2021 : Malware analysis intermediate course (analysis by type) - Inflearn
  • Apr 15 2021 - Apr 23 2021 : Introduction to Windows malware analysis course - Inflearn
  • Nov 12 2012 - Nov 16 2012 : (Skill-up) DDoS, reverse engineering, and incident analysis process (KISA)

ETC

Military Service

  • Aug 07 2014 : Promoted to a Reserve Captain, ROK Army (Reserve Officer Promotion System)
  • Mar 01 2006 - Jun 30 2008 : Discharged as a First Lieutenant, ROK Army (ROTC 44th)

Architecture

Diagram of the overall system configuration built on the response team starting in 2021

hadoop