Journey to Computer Vision

Why Computer Vision

My Lifelong Ambition

I've been learning for data science since I went to Soochow University and majored in Big Data Management in 2017. From freshman year to junior year, I had advanced my skills of programming, data fetching, data mining, EDA (Exploratory Data Analysis) and data analysis by engaging tons of project development and internship. Although I had quite a lot of practical experience, I had not worked in ecstasy on any project until I worked on a face analysis project of Cathay Life Insurance in the summer of my junior year.

This project is about predicting one’s BMI from one’s face image. And I had lots of fun in image-pipeline and designing how to extract and compute the face features of the face images. To me, dealing with image data is much more interesting than processing numeric or text data.

In the next three years, I hope I can improve my skill of computer vision and not just focus on static image. My short-term goal is to build a system that can edit a video automatically and I will choose Vlog type of video on YouTube as my data source first.

With the rapid development of network and technology, our lives are full of information. And the ample information is no longer just structured data, but unstructured data such like video and audio. Since I believe the proportion of unstructured data will dramatically increase in the future, especially video and film data. I hope I can contribute to the innovation of data science and computer vision in my future career.

Current Achievement

Face Analysis Project in junior 2020

The way to extraordinary

Mathematics

2020 Jun - 2020 Dec

A great benefit that Linear Algebra provides to computer vision is that Linear Algebra algorithms usually provide improved efficiency over other algorithms. And lots of programming packing is based on array operation. Additionally, Calculus and Trigonometry are also important for computer vision. They play vital parts of computing the depth, color and light in the image data.

Be Familiar With Algorithms

2021 Jan - 2021 Jul

CNN & RNN - Convolutional Neural Network is good at processing spatially continuous data, such as image recognition. Recurrent Neural Network is suitable for processing data with time series and semantic structure, such as analyzing the content described in the article. In my opinion, it is possible to use both of algorithm in video analysis. A video is a time series and continuous image data, and the people or sound in the video will affect the main content of each video.

LSTM - Long Short-Term Memory

YOLO – You Only Look Once

Module Develop Project - Face & Sound

2021 Jul - 2021 Dec

My short-term goal is to build a system that can edit a Vlog video automatically.

The key point of a Vlog is - Person. Therefore, I am going to build a module that can recognize the people appear in the video and mapping their face with their voice.

The module will separate the human voice and the background music. Short-Time Fourier Transform and RNN method will be the base training method for analyzing human voice. And CNN will be the base rule for human face recognition.

Module Develop Project - Video SLICE Cutter

2022 Jan - 2022 Dec

The main idea of this module is to cut the entire video into different types of SLICE and analyze the attributes of each SLICE. In my opinion, there are four classes of snippet in Vlog video.

1. CHARACTER – the clip that is focus on character of a video

2. OBJECT – the clip that is focus on objects, like foods or product.

3. CHARACTER、OBJECT – the clip that is focus on the interaction of people and objects

4. SCENERY – the clip that is focus on scenery. In the other words, this type of SLICE is few of people or objects.