top of page

Data
Critique

Screenshot of Dataset.png

Screenshot of Our Merged Dataset

DATASET

The Spotify dataset we mainly used for this project has been merged with the gender data from the secondary dataset. This merged dataset contains columns about metadata (e.g., artist, genre, decade), each song’s musical features (e.g., danceability, energy, tempo), and newer attributes from the secondary dataset (gender of the artist). Each of these data points provides valuable insights, but also carries certain limitations that should be acknowledged to better understand the dataset’s utility and drawbacks. We downloaded these datasets from an online site called “Kaggle”, which is a large data science community with powerful tools and resources.

Dataset Capabilities

With these extensive features, the merged dataset serves several powerful analytical purposes:

  • By examining the columns such as 'danceability' and 'energy', we can assess how musical trends have evolved across decades. For example, finding how the average energy levels in tracks vary from the 1960s to the 2010s can provide insights into changing listener preferences.

  • The addition of gender data allows exploration of how gender representation impacts or correlates with musical trends and popularity. For example, the average popularity of track score by female versus male artists can shed light on gender biases within listener behaviors and industry practices.

  • Diving deeper, we can explore how musical features correlate with popularity and whether these correlations differ by gender. This layered analysis can help identify potential gender biases in how different musical attributes are valued.

Dataset Limitations

  • A primary limitation is the dataset’s inherent bias towards commercially successful music. Spotify’s catalog is extensive but not exhaustive, and their promotion algorithms favor tracks under larger labels. This means that indie and underground music might be underrepresented.

  • The dataset assumes a binary approach to gender (male/female), which overlooks non-binary and gender nonconforming artists. This not only makes the analysis biased but also perpetuates binary thinking within a gender spectrum that is far more complex.

  • While the dataset can indicate trends over decades, it lacks the contextual depth needed to fully understand why certain trends emerge. For example, an increase in danceability scores over the 1980s might be correlated with the rise of electronic dance music. However, the dataset alone cannot connect this change to broader sociocultural factors.

  • Although preprocessing was done to handle missing values and transform "decade" into numerical format, these steps might have introduced biases. Dropping rows with missing values might omit pivotal cases. Moreover, transforming ‘60s’ to 1960, ‘70s’ to 1970, and so on, might oversimplify the nuanced differences within each decade.

Personal Bias in Data Presentation

Addressing the critique from a technical perspective contains recognizing that our approach as users come with intrinsic biases shaped by our intent to uncover specific patterns. For instance, our focus on gender representation might lead to giving more weight to gender related findings, possibly overemphasizing issues even when statistical significance is borderline. This educational lens foregrounds gender equality, and while important, it might unintentionally obscure other factors like race, geographical origin, or economic background, which also deeply influence music trends. Similarly, visualizations (average danceability over decades, gender-based popularity, and so on) are influenced by personal biases and the specific framing of questions. Selective emphasis on certain metrics might highlight some aspects while downplaying others, like focusing on ‘danceability’ might miss trends in ‘acousticness’ equally relevant.

Conclusion

The merged Spotify and gender dataset reveals patterns in musical evolution and gender representation, but simultaneously exposes limitations influenced by commercial biases and simplified binary gender categories. While it illuminates significant correlations and trends, deeper inherent biases in data collection and interpretation suggest that critical and careful approaches are necessary. Understanding these limitations and biases ensures more nuanced insights from the data, recognizing its capabilities while striving for broader inclusivity and equity in analysis.

bottom of page