PROJECT 2 FINAL ANALYSIS

After I drafted few questions and understood which tracks protests across the country, including where and when they happened, how many people were involved, and which groups organized them.

I focused on five main questions:

  • Where protests happened the most

  • How protest activity changed over time

  • Who the main organizers were

  • Whether protests could be grouped into types

  • How patterns varied across different states

To answer these, I cleaned and prepared the data, created new variables like the protest month and log of crowd size, and built several visualizations using Python. I also used PCA and KMeans clustering to group similar types of protests based on size, location, and time.

From the analysis, I found that protest activity was highest in places like New York City, Washington D.C., and Los Angeles, with a big spike in 2020. Political and labor groups were the most active organizers, and protests often happened during the summer or around major political events. The clustering model showed that protests could be grouped into three types, each with its own pattern.

This project helped me understand how large social events like protests can be analyzed through data. It also taught me how to turn raw information into meaningful insights using visualizations and clustering methods. Most importantly, it showed me how powerful data can be in helping us understand real-world issues.