Angela Wang's Portfolio

Passionate data and financial information solutions advocator. I aim to use Machine Learning and Data Science to resolve public and financial information issues.

Click to access and scroll down to view my projects.

Interactive Web Application Big Data Project

US Property Tax Analysis

Background: Real Estate Property taxes is an large source of local government revenue. However, from real world data, we can find inequity in taxes around the country. In simpler terms, low-value properties usually have higher property tax rate.

Project description: I built an interactive property tax map web application hosted on AWS EC2 to facilitate graphic analysis. This web application visually shows the geographic distribution of the assessment/tax rate on the US map interactively. In this application, we can view the data all the way from State level, down to county and each census tract level. This gives policy makers a clear trajectory to adjust their respective local policies, so as to lower Real Estate tax burdens for lower-income residents.

Tools and Frameworks used: Hadoop, Apache Spark, Hive, HBase, Kafka, AWS EMR, EC2, S3, CodeDeploy, Java, Javascript, HTML, Scala, HQL.

Deep Learning & Parallelism Project

Analyzing the Performance of Graph Neural Networks with Pipe Parallelism

Published in MLSys - GNNSys 2021 [ArXiv]

Background: Graph Neural Networks (GNNs) are a fascinating upcoming deep learning model designed for problems on irregular graph-structured datasets. Most real life data are better represented as high dimensional graphs, for example, expansive social networks, protein and drug structures, etc. Graph learning has also demonstrated successful applications in domains such as drug design, natural language processing, recommender systems, etc. Unfortunately, training GNNs is much slower than traditional deep neural networks.

Project Description: To make GNN training and inference more efficient when working with multiple GPUs, We propose using pipeline parallelism. This can be done both on the neural network layer level and the data level. We implemented parallelized GNN models and analyzed their performance compared to their unparallelized counterparts when trained on multiple GPUs.

Tools and Frameworks: PyTorch Geometric, Deep Graph Library, GPipe, Graph Neural Networks, Deep Learning, Google Colaboratory, Argonne National Laboratory Servers.

Quantitative Trading Bot Project

Auto Algorithmic Trading and Backtesting with Real Time Trading Bot

Background: This is a personal project where I designed an algorithmic trading tool to allow me to do automated financial investments without monitoring the market manually.

Project Description: I created an auto-trading bot that monitors and retrieves real time market data, uses efficient algorithmic trading strategies and does automatic trading. So far, I've been able to maintain a stable daily profit of at least 1% using the tool. I plan to add more advanced algorithms and strategies to improve the profits.

Tools and Frameworks: Javascript, Python, HTML, real time financial data APIs, trading libraries.

Inequity Municipal Finance Analysis Project

Cook County Scavenger Sale & Assessment Rate Regressivity Analysis

Background: This project is part of the research done by our team at the Center for Municipal Finance in UChicago Harris School of Public Policy.

Project Description: I generated graphics and county/city analysis report, designed novel algorithms for duplicate detection and pattern matching, and implemented Python and R scripts for publishing in a book and for future replications.

Tools and Frameworks: R Markdown, WordPress, Postgresql, Stata, HTML, Box

Machine Learning Project

COVID-19 Global Community Mobility prediction

Background: COVID-19 lock down polices are highly necessary, but have not been satisfactory in general. It's not unreasonable to wonder if we can use technology to help policy makers to make better policies.

Project Description: I analyzed location, weather and mobility data from Google, Twitter, NOAA and US Census Bureau; and built Machine Learning models on top of them to understand the impact of COVID-19 on community mobility around the US. I used these models to predict future mobility, glean insights and offer solutions for optimizing lock down policies.

Tools and Frameworks: Scikit-Learn, GeoPandas, Folium, Seaborn, Matplotlib, Google Maps, TensorFlow, Git

Databases, Auto Script, and other Projects

Philadephia Tax Delinquency project: I created an interactive map for searching and matching the tax agency closest to delinquent real estate properties. I also auto scripted notification emails with specific delinquency details to the taxpayer.

Food Inspection Web Service: I constructed and maintained a live database along with a web application for Food Inspection services, which used unique algorithms to find best matchings for inspection records.

Tools and Frameworks: Python, Django, Postgresql, bottle, pyscopg2, Jupyter Notebooks.