Class name: “Data Science and Visualization”
Taught by: Assistant Professor of Computer Science Sorelle Friedler
Here’s what Friedler had to say about her class:
The class is a 200-level computer science elective. The students learn basic “big data” analysis techniques, such as clustering and network analyses, and apply these to a single data set throughout the semester. The data sets are mostly data sets that professors around campus are interested in analyzing for their own research projects. This year that includes Zapotec/English/Spanish translation data from Brook Lillehaugen, historical points of interest along Lancaster Avenue from Andrew Friedman, information about Philadelphian’s desires for monuments of their city from Paul Farber, and data from the Dark Reaction Project that I’ve been working on with Alex Norquist and Josh Schrier, along with data about Honor Code violations that a student who took this class two years ago and has been involved in Honor Council, Brian Guggenheimer ’16, was interested in understanding in more depth.
My hope is that the class gives students the chance to learn data analysis techniques while also understanding the real-world issues and implications of those techniques by applying them to data people care about. We talk a lot about the complications of handling missing data and the subtle assumptions made about the “truth” of such data, and how these issues influence the results of data analyses. We also discuss the way that visualizations can help users explore data and the potential pitfalls of such data visualizations. All of this is in the context of learning a set of important algorithms (k-means clustering, PageRank, linear regression, etc.), technical skills (python, Javascript, and d3), and programming these algorithms and visualizations for the data set they’re examining.
I created the class because I think it’s important for students to have experience learning these data analysis and visualization techniques in the context of real, complex data, and thinking about the technical and ethical issues this exposes. There is also a large programming component to the course, with many labs that build on previous assignments, and I think that’s important as well so that students become more comfortable coding. Data science is a new and growing area within computer science and in associated interdisciplinary fields, and I’m glad that we can offer students the chance to think about the complex issues in the field while learning its fundamentals.
See what other courses the Computer Science Department is offering this semester.
Image: (cc) Camelia Boban
Cool Classes is a series that highlights interesting, unusual, and unique courses that enrich the Haverford experience.