my CV
Haotian Yu’s cv/resume
Haotian Yu’s Resume File (google doc)
If you prefer to read CV information in files, you can click “Haotian Yu’s Resume File (google doc)” to read the PDF file
Haotian Yu’s LinkedIn
Haotian Yu’s LinkedIn linkage
Publications:
DP2-Pub: Differentially Private High-Dimensional Data Publication with Invariant Post Randomization Aug-2022
Using graph theory and SA (Simulated Annealing) algorithm for
the de-anonymization of social media information
Differentially private data publication with multi-level data utility Dec-2021
Using differential privacy for multi-layered data
Structure-Attribute-Based Social Network Deanonymization with spectral graph partitioning May-2021
High-dimensional data differential privacy using randomization and de-randomization
Haotian Yu’s Education Background
M.S., Data Analysis-George Washington University. Washington, DC Sep 2018 – May 2020
SEAS(School of Engineering and Applied Science)
B.A., Statistics-University of Minnesota. Minneapolis, Minnesota Sept 2014 – May 2017
B.A., Accounting-Shandong University of Finance and Economics Jinan, China Sept 2012 – June 2016
Work & Research Experience
Data Engineer & Analyst (Data of Vehicle Research & Development) –Li Auto
|Beijing,China | August 2021 - Present|
He managed data-driven projects for the R&D department of China's
leading electric vehicle company, Nasdaq: Li.
• Signal Processing & Data Warehousing: Built data pipeline and ETL on ODS data from car end CANLOG signals, SOA signals and software telemetry to cloud, and processed ODS data to extract features, capturing usage condition characteristics, time-domain features, frequency-domain condition features, and problem features, etc, and stored them in data warehouses, then to data mart using Scala Apache Spark and Pyspark. Eventually, he automated real-time/batch-processed data-driven reports using Python Streamlit and deployed them to vehicle engineers in order to improve vehicle IVDP (Intelligent Vehicle Development Process). Optimized data warehousing and SQL queries, resulting in a 25% increase in performance tuning and a 10% decrease in spark job failure rate. Solved data-skewed problems on optimizing partitions.
• NLP/Labeling: Labeled data from work orders (TS, CPS, CRM, etc.) and user feedback (from social media, user apps, beta testers, etc.). Trained models to classify issues as Functional Domain–Vehicle Functions–Minor Function Modules/Integration–Problem Nodes (Sensors/Transmitters/Actuators).
• Modeling:
- Use XGBoost to build a predictive maintenance model that forecasts when a machine is likely to fail based on historical data, and our team could perform maintenance before a breakdown occurs.
- Data Mining project: Use signal data from existing sensors to further mine potential signal data and replace 10% of the sensor cost.
- He designed and developed models to identify functional failures and performance issues in the cockpit interface, key connections, Bluetooth and door-lock modules, and Wi-Fi key connections on Li L9/ L8. The model has reached an accuracy of 95%, and it has been stored on the car end as a DTC (Diagnostic Trouble Code). Details of each issue will be sent to the corresponding customer service team and guide them to have the parts replaced;
- Designed and implemented various metrics based on self-drive vehicle systems (ADAS), wipers, cockpit interfaces, and phone keys with Bluetooth. Participated in metrics alert project, when metrics reached an alert number, an alert message will be sent to the responsible engineers and customer service team to catch their attention.
- Production Line Issues: Addressed Bluetooth calibration and door welding issues on the vehicle assembly line. Utilized data from the vehicle inspection line to analyze optimal calibration values for Bluetooth connections, capacitive triggers, and command transmissions.
• GPT: Created company’s own knowledge base bot, trained LLM model using open-sourced GPT 3.5 turbo model, built data modeling and input table description of various data sources in Langchain prompt; this project enhanced data analyst work efficiency and information exchange.
• Automated Testing: Designed automated testing procedures for high and low-frequency keys and Bluetooth. Developed the code logic for the automated key test and deployed it on the company’s platform, contributing to the R&D of the flagship MEGA MPV by Li Auto.
• Automated product:
- Automated Air Conditioning: Based on data from different scenarios, external environments, and user air conditioning usage, XGBoost models were trained to predict the best sir condition sets for the best customers’ comfort. Personalized auto air-conditioning models can be pushed to vehicles via OTA. The model is Continuously optimized based on user feedback.
- Automatic Windshield Wipers: Adapted wiper speed outputs personalized to users, based on their usage and varying rainfall conditions.
BrainUp Technology– Algorithm engineer
|Beijing, China | March 2021- July 2021 |
EEG-based Fatigue and Attention Recognition Algorithm on Brain-Computer
Interface Products: Python, JAVA, txt, Alibaba Cloud, MySQL, Linux
• Researched machine learning and deep learning algorithms for fatigue recognition and attention level identification based on single-channel EEG data and conducted data analysis.
• Designed and wrote computer programs for fatigue and distraction guidance while utilizing the company’s R&D hardware products to collect EEG data from 90 participants, totaling 270 hours.
• Applied filtering treatments to the frequency bands and wavelet analysis, extracting PSD (Power Spectral Density) post FFT from all participant’s EEG data across the 5-50Hz frequency bands. Conducted feature engineering and performed PCA analysis on different algorithms to identify effective variables.
• Based on collected wakefulness and fatigue data along with effective variables, carried out machine learning methods (trials included Support Vector Machines (SVM), Random Forest (RF), Relevance Vector Machines (RVD), LSTM, etc.) for fatigue and attention classification. Achieved a 92% accuracy rate for fatigue identification. Applied deep learning (Cascade-NN and GAN) and achieved an 86% accuracy rate.
• Led the algorithm team to achieve Top 15 rankings in two Brain-Computer Interface competitions at the 2021 World Robot Competition.
• Contributed to Invention Patent for Brain-Computer Interface Products and research paper “Identifying Mental Fatigue States of Construction Workers through BCI using Deep Learning Methods”, which is about Utilizing deep learning to identify whether construction workers are fatigued.
Predicting defendants’ appearance at court (GWU):
|Washington DC,USA | Feb 2020 - May 2020 |
Use feature engineering and Machine Learning models to predict defendants’ appearance in court based on their background and case data: Python
• Used NLP to get keywords from cases. Classified the cases and identified whether the case is a misdemeanor or felony.
• Applying features engineering and selection for the court records and arrest data. Analyzing by using PCA and T-Pot.
• Creating models such as SVM, KNN, and Logistic Regression to cover the selected features to predict defendants’ appearance in court based on existing information. Dealt with Missing data for the case record and compared different data models. Argued and discussed the data models, and then output prediction and calculated the accuracy.
• Evaluated the data model for 92% accuracy.
Research: Algorithm Development for Social Network Classification (GWU):
|Washington DC, USA | Sep 2019- Feb 2020 |
High-dimensional data differential privacy using randomization and de-randomization
• Research in Graph algorithm, SA model, and the theory of Topology.
• Applied the developed researched SA algorithm to classify different accounts in social media according to the relationships of following between each account to find the groups in the social media network relationship. Separated the nodes in different sets by using K-Means and Did Union-find algorithm to classify the social media group.
• Researched the similarity of the related degree and the features for all the nodes and did analysis for whether the developed algorithm works well after the useless noises of relationships are added. Made sure the algorithm is reasonable and reliable.
• Run the algorithm on the dataset and evaluate the accuracy (based on different metrics). Found the best balance measurements of different similarities for the best accuracy. Evaluated run time to measure the utility and time complexity.
• Increased the accuracy to 97.8% by modifying the algorithm and making the whole project efficient
Movie Recommendation System (Group project, GWU, Python, JavaScript, HTML, Java):
|Washington DC, USA | Jan 2020 - May 2020 |
Build a movie recommendation system and its business insight
• Collected movie data and created features based on audio analysis, image recognition and analysis, and the Natural Language Process.
• Analyzed user and movie information. Did research and applied models or algorithms such as Content-based Recommendation systems, SVD, KNN, and CNN, for data analytics.
• Split training and test cases and predicted the results based on the training data. Calculated RMSE and MAE, which are related to accuracy. Analyzed the pros and cons of different models or algorithms and combined the useful algorithms SVD and CNN.
• Created business cases including website service and business exploration. Applied the combination of the algorithms in the sample web service of the movie recommendation system.