10: Pandas for data analysis with Python: Part 1#
What is Pandas?#
Pandas is a library in Python that is designed for data manipulation and analysis
Especially tabular data, as in an SQL table or Excel spreadsheet. So things like:
Time series data
Arbitrary matrix data with meaningful row and column labels
Any other form of observational / statistical data sets
Example / motivating use cases#
Importing the pandas library (getting started)#
What is a library?#
You can think of a library is a collection of functions and data structures. You import a library (or subsets of it) into your program / notebook so you have access to special functions or data structures in your program.
You are already using Python’s standard library, which includes built-in functions like print()
, and built-in data structures like str
and dict
. Every time you fire up Python, these are “imported” into your program in the background.
As you advance in your programming career, you will often find that you want to solve some (sub)problems that others have tried to do, and wrote a collection of functions and/or data structures to solve those problems really well, and saved that collection into a library that others can use. Take advantage of this!
You should learn how to read documentation for libraries#
You should have handy access to (and know how to use):
Docs for “ground truth”
Some collection of examples for references.
The pandas website is decent place to start: https://pandas.pydata.org/
This “cheat sheet” is also a really helpful guide to more common operations that you may run into later: https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf
There are also many blogs that are helpful, like towardsdatascience.com
The cool thing about pandas and data analysis in python is that many people share notebooks that you can inspect / learn from / adapt code for your own projects (just like mine!).
Learning how to use libraries is training for learning to code in teams, using code from others. Basically nobody writes anything all from scratch, unless they are trying to really REALLY learn something deeply.
“importing” a library: mechanics#
Here’s what it looks like to import a library and use it, conceptually with a “fake” library, and with the pandas library
We often want to import libraries with “as”
The name after as
is sort of like a variable name; usually we do that if the library name is clunky, or might conflict with variable names we want to use
For pandas, by convention people usually import it as pd
.
Let’s do that quickly to illustrate.
# import the pandas library, give it the name pd for easier access
import pandas as pd
/Users/joelchan/opt/anaconda3/lib/python3.9/site-packages/pandas/core/computation/expressions.py:21: UserWarning: Pandas requires version '2.8.4' or newer of 'numexpr' (version '2.7.3' currently installed).
from pandas.core.computation.check import NUMEXPR_INSTALLED
/Users/joelchan/opt/anaconda3/lib/python3.9/site-packages/pandas/core/arrays/masked.py:60: UserWarning: Pandas requires version '1.3.6' or newer of 'bottleneck' (version '1.3.2' currently installed).
from pandas.core import (
# test here
courses = pd.read_csv("INST courses.csv")
courses
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | NaN | 3.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | NaN | 3.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | NaN | 3.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | NaN | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | NaN | 3.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | NaN | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | NaN | 3.0 |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 |
28 | INST709 | Independent Study | NaN | NaN | |
29 | INST728G | Special Topics in Information Studies; Smart C... | NaN | NaN | |
30 | INST728V | Special Topics in Information Studies; Digital... | NaN | NaN | |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 |
import os
os.getcwd()
'/Users/joelchan/Projects/inst126-intro-programming-notes-copy'
import random
random.randint(1,6)
1
The core of Pandas: The dataframe data structure#
We’ve so far progressed from single-item data structures (str
, int
, float
) to “basic” collections (list
, dict
)
Now we will learn about the dataframe
, which has:
nice properties of both lists (orderable, indexable) and dictionaries (can retrieve things quickly by key, store associated values)
and othe properties and built-in algorithms and methods that are useful for data analysis (e.g., summarizing, grouping, statistics, etc.)
Remember: data structures and algorithms go hand in hand: people made dataframes (and the associated pandas library) so we can do particular kinds of algorithms more easily.
Dataframes are basically like smart spreadsheets that Python can read/write
The data is in rows and columns. Columns in pandas are special data structures called series
.
More here
Dataframes combine the best characteristics of lists and dictionaries, and more!#
Can sort (from lists)
Can access data by key (from dictionaries)
Can also reindex easily!
# show me the "columns"
courses.columns
Index(['Code', 'Title', 'Description', 'Prereqs', 'Credits'], dtype='object')
# get the code column
courses['Code']
0 INST126
1 INST201
2 INST311
3 INST314
4 INST326
5 INST327
6 INST335
7 INST346
8 INST352
9 INST354
10 INST362
11 INST377
12 INST408Y
13 INST408Z
14 INST414
15 INST447
16 INST462
17 INST466
18 INST490
19 INST604
20 INST612
21 INST614
22 INST616
23 INST622
24 INST627
25 INST630
26 INST652
27 INST702
28 INST709
29 INST728G
30 INST728V
31 INST733
32 INST737
33 INST741
34 INST742
35 INST746
36 INST762
37 INST767
38 INST776
39 INST785
40 INST794
Name: Code, dtype: object
# find the courses that are 3 credits
courses[courses['Credits'] == 3.0]
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | NaN | 3.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | NaN | 3.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | NaN | 3.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | NaN | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | NaN | 3.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | NaN | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | NaN | 3.0 |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 |
# find all courses where the title contains the word introduction
courses[courses['Title'].str.contains("Introduction")]
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | NaN | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 |
courses.head(10) # show me the top 10 rows in the dataframe
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 |
Common operations (basic)#
Let’s go over some common operations with dataframes. This will overlap with your PCE, mostly Q1-5 and Q8.
Constructing a dataframe#
From other data structures (e.g., lists, dictionaries)#
Seldom use this at the start (usually we import data from an external file like a .csv
file into a dataframe.
But I do use this frequently when I’m creating new dataframes for analysis from existing data(frames). Might not be the best pattern to emulate (but it works for me!): a lot of what I do could probably be done more elegantly with proper use of .groupby()
and .apply()
(more on this next week).
But it’s useful to do this to get a sense of how a dataframe combines aspects of lists and dictionaries. Because a common input ‘literal’ for a dictionary (just like the input literal for an int has to be numbers), is a set of “records” - a list of dictionaries, where each dictionary is a row, and within each dictionary, a key is a column (with an associated value).
basic_data = [
{'name': 'Joel', 'role': 'instructor'},
{'name': 'Sarah', 'role': 'UTA'}
]
# construct a dataframe from the basic_data list of dictionaries
example_df = pd.DataFrame(basic_data)
example_df
name | role | |
---|---|---|
0 | Joel | instructor |
1 | Sarah | UTA |
example_df.sort_values(by="name", ascending=False)
name | role | |
---|---|---|
1 | Sarah | UTA |
0 | Joel | instructor |
more_basic_data = [
{'school': 'UMD', 'fundingModel': 'public', 'conference': 'Big Ten'},
{'school': 'Harvard', 'fundingModel': 'private', 'conference': 'Harvard'}
]
# let's make this into a dataframe!
schoolsDF = pd.DataFrame(more_basic_data)
schoolsDF
school | fundingModel | conference | |
---|---|---|---|
0 | UMD | public | Big Ten |
1 | Harvard | private | Harvard |
# let's make another sample dataset!
marvel_movies = [
{"name": "Iron Man 1", "Phase": 1, "Year release": 2008},
{"name": "Avengers 1", "Phase": 1, "Year release": 2012},
{"name": "Avengers: Endgame", "Phase": 3, "Year release": 2020}
]
# and turn it into a dataframe
marvel_df = pd.DataFrame(marvel_movies)
marvel_df
name | Phase | Year release | |
---|---|---|---|
0 | Iron Man 1 | 1 | 2008 |
1 | Avengers 1 | 1 | 2012 |
2 | Avengers: Endgame | 3 | 2020 |
marvel_df.to_csv("marvel-movies.csv")
marvel_df[marvel_df['Phase'] == 1]
name | Phase | Year release | |
---|---|---|---|
0 | Iron Man 1 | 1 | 2008 |
1 | Avengers 1 | 1 | 2012 |
From (external) data files#
Most frequently this is done with .read_csv()
, but there are many other common formats, such as json
. See here for a full listing
csv stands for comma-separated-values
commonly used because it’s plain-text, technically. this means any program that can read a string can read this file. and have it be meaningful. not so with excel files!
courses = pd.read_csv("INST courses.csv") # needs a path to a csv file
courses
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | NaN | 3.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | NaN | 3.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | NaN | 3.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | NaN | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | NaN | 3.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | NaN | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | NaN | 3.0 |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 |
28 | INST709 | Independent Study | NaN | NaN | |
29 | INST728G | Special Topics in Information Studies; Smart C... | NaN | NaN | |
30 | INST728V | Special Topics in Information Studies; Digital... | NaN | NaN | |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 |
Inspecting your dataframe#
Common operations:
summarizing
filtering / accessing
sorting
Summarizing#
With:
.head()
.describe()
various stats
# we have a dataframe named df
# df has a method called head
# can optionally pass in a parameter to tell how many rows from the top to return
courses.head(20) # show the top 20
Code | Title | Description | Prereqs | Credits | |
---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 |
import random # importing a library! :) to generate random numbers
courses['random_number'] = [c + random.randint(0,5) for c in courses['Credits']]
courses.head(25)
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 6.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | NaN | 3.0 | 4.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 | 3.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 | 4.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 5.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 | 5.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 | 3.0 |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | NaN | |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | NaN | |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 | 3.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 | 6.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 | 3.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 | 8.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 | 5.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 | 7.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | NaN | 3.0 | 6.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | NaN | 3.0 | 4.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | NaN | 3.0 | 4.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | NaN | 3.0 | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | NaN | 3.0 | 8.0 |
courses.describe()
Credits | random_number | |
---|---|---|
count | 36.0 | 36.000000 |
mean | 3.0 | 5.583333 |
std | 0.0 | 1.762709 |
min | 3.0 | 3.000000 |
25% | 3.0 | 4.000000 |
50% | 3.0 | 6.000000 |
75% | 3.0 | 7.000000 |
max | 3.0 | 8.000000 |
ncaa = pd.read_csv("ncaa-team-data.csv")
ncaa.head()
school | conf | rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_result | ncaa_numeric | season | coaches | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | air-force | MWC | 1 | 12 | 21 | -2.99 | 1.08 | 73.1 | 75.1 | 148.2 | 30 | 30 | 30 | -2.0 | NaN | 0 | 2016-17 | Dave Pilipovich (12-21) | 2016.0 |
1 | air-force | MWC | 2 | 14 | 18 | -5.51 | 0.66 | 68.4 | 72.8 | 141.2 | 30 | 30 | 30 | -4.4 | NaN | 0 | 2015-16 | Dave Pilipovich (14-18) | 2015.0 |
2 | air-force | MWC | 3 | 14 | 17 | -1.85 | -0.71 | 65.7 | 65.1 | 130.8 | 30 | 30 | 30 | 0.6 | NaN | 0 | 2014-15 | Dave Pilipovich (14-17) | 2014.0 |
3 | air-force | MWC | 4 | 12 | 18 | -4.08 | 1.71 | 66.0 | 69.1 | 135.1 | 30 | 30 | 30 | -3.1 | NaN | 0 | 2013-14 | Dave Pilipovich (12-18) | 2013.0 |
4 | air-force | MWC | 5 | 18 | 14 | 4.18 | 4.28 | 70.0 | 67.8 | 137.8 | 30 | 30 | 30 | 2.2 | NaN | 0 | 2012-13 | Dave Pilipovich (18-14) | 2012.0 |
ncaa['school'].value_counts()
school
yale 122
minnesota 122
bucknell 122
penn-state 121
temple 121
...
hampton 22
southwestern-ks 21
allegheny 21
wisconsin-stevens-point 21
loyola-la 21
Name: count, Length: 338, dtype: int64
ncaa.describe()
rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_numeric | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 24029.000000 | 24029.000000 | 24029.000000 | 16945.000000 | 16945.000000 | 7815.000000 | 6916.000000 | 6916.000000 | 24029.000000 | 24029.000000 | 24029.000000 | 6916.000000 | 24029.000000 | 24029.000000 |
mean | 43.961380 | 13.641974 | 11.607807 | -0.554195 | -0.184372 | 70.232028 | 68.843855 | 138.557244 | 29.084897 | 27.811478 | 28.904865 | 0.869534 | 0.972283 | 1969.397312 |
std | 30.694366 | 6.366854 | 5.509024 | 9.975686 | 5.447946 | 6.537464 | 5.977788 | 10.468501 | 4.266874 | 6.542723 | 4.666435 | 6.317951 | 4.754037 | 32.310531 |
min | 1.000000 | 0.000000 | 0.000000 | -44.010000 | -22.460000 | 0.000000 | 35.200000 | 81.300000 | 1.000000 | 1.000000 | 1.000000 | -26.700000 | 0.000000 | 1892.000000 |
25% | 18.000000 | 9.000000 | 8.000000 | -7.480000 | -4.210000 | 66.100000 | 65.200000 | 132.500000 | 30.000000 | 30.000000 | 30.000000 | -3.300000 | 0.000000 | 1943.000000 |
50% | 37.000000 | 13.000000 | 11.000000 | -0.570000 | -0.300000 | 70.200000 | 68.850000 | 138.600000 | 30.000000 | 30.000000 | 30.000000 | 1.000000 | 0.000000 | 1976.000000 |
75% | 67.000000 | 18.000000 | 15.000000 | 6.350000 | 3.990000 | 74.400000 | 72.500000 | 145.100000 | 30.000000 | 30.000000 | 30.000000 | 5.200000 | 0.000000 | 1997.000000 |
max | 122.000000 | 38.000000 | 31.000000 | 34.800000 | 16.000000 | 101.000000 | 99.900000 | 199.100000 | 30.000000 | 30.000000 | 30.000000 | 24.600000 | 48.000000 | 2016.000000 |
ncaa['w'].min()
0
ncaa.describe()
rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_numeric | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 24029.000000 | 24029.000000 | 24029.000000 | 16945.000000 | 16945.000000 | 7815.000000 | 6916.000000 | 6916.000000 | 24029.000000 | 24029.000000 | 24029.000000 | 6916.000000 | 24029.000000 | 24029.000000 |
mean | 43.961380 | 13.641974 | 11.607807 | -0.554195 | -0.184372 | 70.232028 | 68.843855 | 138.557244 | 29.084897 | 27.811478 | 28.904865 | 0.869534 | 0.972283 | 1969.397312 |
std | 30.694366 | 6.366854 | 5.509024 | 9.975686 | 5.447946 | 6.537464 | 5.977788 | 10.468501 | 4.266874 | 6.542723 | 4.666435 | 6.317951 | 4.754037 | 32.310531 |
min | 1.000000 | 0.000000 | 0.000000 | -44.010000 | -22.460000 | 0.000000 | 35.200000 | 81.300000 | 1.000000 | 1.000000 | 1.000000 | -26.700000 | 0.000000 | 1892.000000 |
25% | 18.000000 | 9.000000 | 8.000000 | -7.480000 | -4.210000 | 66.100000 | 65.200000 | 132.500000 | 30.000000 | 30.000000 | 30.000000 | -3.300000 | 0.000000 | 1943.000000 |
50% | 37.000000 | 13.000000 | 11.000000 | -0.570000 | -0.300000 | 70.200000 | 68.850000 | 138.600000 | 30.000000 | 30.000000 | 30.000000 | 1.000000 | 0.000000 | 1976.000000 |
75% | 67.000000 | 18.000000 | 15.000000 | 6.350000 | 3.990000 | 74.400000 | 72.500000 | 145.100000 | 30.000000 | 30.000000 | 30.000000 | 5.200000 | 0.000000 | 1997.000000 |
max | 122.000000 | 38.000000 | 31.000000 | 34.800000 | 16.000000 | 101.000000 | 99.900000 | 199.100000 | 30.000000 | 30.000000 | 30.000000 | 24.600000 | 48.000000 | 2016.000000 |
ncaa.hist(column="w")
array([[<AxesSubplot:title={'center':'w'}>]], dtype=object)

Getting/accessing parts of our dataframe#
Most basic is just getting a specific column. Looks like the basic way we index things in lists or dictionaries.
courses.columns
Index(['Code', 'Title', 'Description', 'Prereqs', 'Credits', 'random_number'], dtype='object')
courses['Code']
0 INST126
1 INST201
2 INST311
3 INST314
4 INST326
5 INST327
6 INST335
7 INST346
8 INST352
9 INST354
10 INST362
11 INST377
12 INST408Y
13 INST408Z
14 INST414
15 INST447
16 INST462
17 INST466
18 INST490
19 INST604
20 INST612
21 INST614
22 INST616
23 INST622
24 INST627
25 INST630
26 INST652
27 INST702
28 INST709
29 INST728G
30 INST728V
31 INST733
32 INST737
33 INST741
34 INST742
35 INST746
36 INST762
37 INST767
38 INST776
39 INST785
40 INST794
Name: Code, dtype: object
Let’s say you want a particular statistic for only one column. You can do this by accessing the series, and asking for a specific statistic.
courses['random_number'].median()
6.0
Filtering the data based on one or more columns#
But we sometimes also want to get subsets of the data, depending on one or more column values.
We can do this with indexing notation (I use this because I’m used to it).
The stuff you put in the brackets is a Boolean expression Any row where the answer is TRUE, will come back; anything where the answer is FALSE, is filtered out
# get me all the rows where the value of the column Code is equal to INST126
courses[courses['Code']=="INST126"]
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 6.0 |
# get me all the rows where the value of the column random_number is greater than 5
courses[courses['random_number'] > 5]
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 6.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 | 6.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 | 8.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | NaN | 3.0 | 7.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | NaN | 3.0 | 6.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | NaN | 3.0 | 8.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | NaN | 3.0 | 6.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | NaN | 3.0 | 6.0 |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 | 8.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 | 7.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 | 7.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 | 7.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 | 6.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 | 7.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 | 8.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 | 8.0 |
# find all of the seasons where yale had at least 11 wins
ncaa[(ncaa['school'] == "yale") & (ncaa['w'] >= 20)]
school | conf | rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_result | ncaa_numeric | season | coaches | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
23871 | yale | Ivy | 2 | 23 | 7 | 9.08 | -1.03 | 74.9 | 63.8 | 138.7 | 30 | 30 | 30 | 11.1 | Lost Second Round | 2 | 2015-16 | James Jones (23-7) | 2015.0 |
23872 | yale | Ivy | 3 | 22 | 10 | 3.53 | -0.87 | 65.6 | 58.5 | 124.1 | 30 | 30 | 30 | 7.1 | NaN | 0 | 2014-15 | James Jones (22-10) | 2014.0 |
23885 | yale | Ivy | 16 | 21 | 11 | -0.08 | -5.32 | 74.8 | 70.5 | 145.3 | 30 | 30 | 30 | 4.3 | NaN | 0 | 2001-02 | James Jones (21-11) | 2001.0 |
23938 | yale | Ivy | 69 | 22 | 8 | NaN | NaN | 69.6 | 56.6 | 126.2 | 30 | 11 | 11 | 13.0 | Lost Regional Semifinal | 8 | 1948-49 | Howard Hobson (22-8) | 1948.0 |
23979 | yale | Ivy | 110 | 20 | 9 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1907-08 | William Lush (20-9) | 1907.0 |
23980 | yale | Ivy | 111 | 30 | 7 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1906-07 | William Lush (30-7) | 1906.0 |
23982 | yale | Ivy | 113 | 22 | 13 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1904-05 | Unknown | 1904.0 |
print(set(ncaa['
File "/var/folders/xz/_hjc5hsx743dclmg8n5678nc0000gn/T/ipykernel_46853/1243815937.py", line 1
print(set(ncaa['
^
SyntaxError: EOL while scanning string literal
# find all of the seasons where a Big Ten school had at least ten wins
ncaa[(ncaa['conf'] == "Big Ten") | (ncaa['w'] >= 20)]
school | conf | rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_result | ncaa_numeric | season | coaches | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
10 | air-force | MWC | 11 | 26 | 9 | 15.19 | 3.34 | 69.0 | 56.0 | 125.0 | 30 | 13 | 30 | 13.0 | NaN | 0 | 2006-07 | Jeff Bzdelik (26-9) | 2006.0 |
11 | air-force | MWC | 12 | 24 | 7 | 10.20 | 2.49 | 64.2 | 54.7 | 118.9 | 30 | 30 | 30 | 9.5 | Lost First Round | 1 | 2005-06 | Jeff Bzdelik (24-7) | 2005.0 |
13 | air-force | MWC | 14 | 22 | 7 | 9.12 | 0.08 | 59.9 | 50.9 | 110.8 | 30 | 25 | 30 | 9.0 | Lost First Round | 1 | 2003-04 | Joe Scott (22-7) | 2003.0 |
60 | akron | MAC | 1 | 26 | 8 | 3.97 | -1.52 | 77.2 | 70.3 | 147.5 | 30 | 30 | 30 | 6.9 | NaN | 0 | 2016-17 | Keith Dambrot (26-8) | 2016.0 |
61 | akron | MAC | 2 | 26 | 9 | 5.55 | -1.24 | 76.6 | 68.0 | 144.6 | 30 | 30 | 30 | 8.6 | NaN | 0 | 2015-16 | Keith Dambrot (26-9) | 2015.0 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
23938 | yale | Ivy | 69 | 22 | 8 | NaN | NaN | 69.6 | 56.6 | 126.2 | 30 | 11 | 11 | 13.0 | Lost Regional Semifinal | 8 | 1948-49 | Howard Hobson (22-8) | 1948.0 |
23979 | yale | Ivy | 110 | 20 | 9 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1907-08 | William Lush (20-9) | 1907.0 |
23980 | yale | Ivy | 111 | 30 | 7 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1906-07 | William Lush (30-7) | 1906.0 |
23982 | yale | Ivy | 113 | 22 | 13 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1904-05 | Unknown | 1904.0 |
24011 | youngstown-state | Mid-Cont | 20 | 20 | 9 | -1.92 | -7.66 | 72.3 | 64.6 | 136.9 | 30 | 30 | 30 | 7.7 | NaN | 0 | 1997-98 | Dan Peters (20-9) | 1997.0 |
5293 rows × 19 columns
# find all of the seasons where harvard had at least 15 losses
ncaa[(ncaa['school'] == "harvard") & (ncaa['l'] >= 15)]
school | conf | rk | w | l | srs | sos | pts_for | pts_vs | pts_total | ap_pre | ap_high | ap_final | pts_diff | ncaa_result | ncaa_numeric | season | coaches | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
7329 | harvard | Ivy | 2 | 14 | 16 | -0.96 | -0.25 | 66.7 | 66.2 | 132.9 | 30 | 30 | 30 | 0.5 | NaN | 0 | 2015-16 | Tommy Amaker (14-16) | 2015.0 |
7337 | harvard | Ivy | 10 | 8 | 22 | -10.76 | -4.86 | 68.5 | 74.4 | 142.9 | 30 | 30 | 30 | -5.9 | NaN | 0 | 2007-08 | Tommy Amaker (8-22) | 2007.0 |
7338 | harvard | Ivy | 11 | 12 | 16 | -9.67 | -4.56 | 72.3 | 77.4 | 149.7 | 30 | 30 | 30 | -5.1 | NaN | 0 | 2006-07 | Frank Sullivan (12-16) | 2006.0 |
7340 | harvard | Ivy | 13 | 12 | 15 | -6.73 | -4.47 | 67.1 | 69.4 | 136.5 | 30 | 30 | 30 | -2.3 | NaN | 0 | 2004-05 | Frank Sullivan (12-15) | 2004.0 |
7341 | harvard | Ivy | 14 | 4 | 23 | -16.19 | -4.37 | 64.7 | 76.6 | 141.3 | 30 | 30 | 30 | -11.9 | NaN | 0 | 2003-04 | Frank Sullivan (4-23) | 2003.0 |
7342 | harvard | Ivy | 15 | 12 | 15 | -5.80 | -3.65 | 71.4 | 73.3 | 144.7 | 30 | 30 | 30 | -1.9 | NaN | 0 | 2002-03 | Frank Sullivan (12-15) | 2002.0 |
7345 | harvard | Ivy | 18 | 12 | 15 | -12.75 | -9.75 | 65.9 | 67.7 | 133.6 | 30 | 30 | 30 | -1.8 | NaN | 0 | 1999-00 | Frank Sullivan (12-15) | 1999.0 |
7350 | harvard | Ivy | 23 | 6 | 20 | -14.50 | -9.10 | 66.8 | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1994-95 | Frank Sullivan (6-20) | 1994.0 |
7351 | harvard | Ivy | 24 | 9 | 17 | -12.43 | -8.75 | 68.0 | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1993-94 | Frank Sullivan (9-17) | 1993.0 |
7352 | harvard | Ivy | 25 | 6 | 20 | -16.14 | -4.70 | 69.5 | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1992-93 | Frank Sullivan (6-20) | 1992.0 |
7353 | harvard | Ivy | 26 | 6 | 20 | -17.52 | -4.40 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1991-92 | Frank Sullivan (6-20) | 1991.0 |
7354 | harvard | Ivy | 27 | 9 | 17 | -12.59 | -4.59 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1990-91 | Peter Roby (9-17) | 1990.0 |
7356 | harvard | Ivy | 29 | 11 | 15 | -14.74 | -8.66 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1988-89 | Peter Roby (11-15) | 1988.0 |
7357 | harvard | Ivy | 30 | 11 | 15 | -13.28 | -5.69 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1987-88 | Peter Roby (11-15) | 1987.0 |
7358 | harvard | Ivy | 31 | 9 | 17 | -7.78 | -5.36 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1986-87 | Peter Roby (9-17) | 1986.0 |
7359 | harvard | Ivy | 32 | 6 | 20 | -19.73 | -7.69 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1985-86 | Peter Roby (6-20) | 1985.0 |
7363 | harvard | Ivy | 36 | 11 | 15 | -10.07 | -6.25 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1981-82 | Frank McLaughlin (11-15) | 1981.0 |
7365 | harvard | Ivy | 38 | 11 | 15 | -13.37 | -7.37 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1979-80 | Frank McLaughlin (11-15) | 1979.0 |
7366 | harvard | Ivy | 39 | 8 | 21 | -10.58 | -1.79 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1978-79 | Frank McLaughlin (8-21) | 1978.0 |
7367 | harvard | Ivy | 40 | 11 | 15 | -9.60 | -4.90 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1977-78 | Frank McLaughlin (11-15) | 1977.0 |
7368 | harvard | Ivy | 41 | 9 | 16 | -12.73 | -3.33 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1976-77 | Tom Sanders (9-16) | 1976.0 |
7369 | harvard | Ivy | 42 | 8 | 18 | -9.20 | -5.12 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1975-76 | Tom Sanders (8-18) | 1975.0 |
7375 | harvard | Ivy | 48 | 7 | 19 | -10.19 | 0.25 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1969-70 | Robert Harrison (7-19) | 1969.0 |
7376 | harvard | Ivy | 49 | 7 | 18 | -9.80 | -3.12 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1968-69 | Robert Harrison (7-18) | 1968.0 |
7382 | harvard | Ivy | 55 | 6 | 15 | -11.34 | -6.28 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1962-63 | Floyd S. Wilson (6-15) | 1962.0 |
7386 | harvard | Ivy | 59 | 10 | 15 | -14.71 | -7.38 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1958-59 | Floyd S. Wilson (10-15) | 1958.0 |
7389 | harvard | Ivy | 62 | 8 | 16 | -17.54 | -7.10 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1955-56 | Floyd S. Wilson (8-16) | 1955.0 |
7390 | harvard | Ivy | 63 | 6 | 17 | -11.97 | -4.33 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1954-55 | Floyd S. Wilson (6-17) | 1954.0 |
7391 | harvard | Ivy | 64 | 9 | 16 | -14.61 | -3.02 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1953-54 | Bo Shepard (9-16) | 1953.0 |
7392 | harvard | Ivy | 65 | 7 | 16 | -15.52 | -3.69 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1952-53 | Bo Shepard (7-16) | 1952.0 |
7393 | harvard | Ivy | 66 | 5 | 17 | -16.94 | -2.75 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1951-52 | Bo Shepard (5-17) | 1951.0 |
7394 | harvard | Ivy | 67 | 8 | 18 | -8.66 | 1.56 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1950-51 | Bo Shepard (8-18) | 1950.0 |
7395 | harvard | Ivy | 68 | 9 | 15 | -7.61 | -1.75 | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1949-50 | Bo Shepard (9-15) | 1949.0 |
7396 | harvard | Ivy | 69 | 3 | 20 | NaN | NaN | 51.3 | 61.4 | 112.7 | 30 | 30 | 30 | -10.1 | NaN | 0 | 1948-49 | William Barclay (3-20) | 1948.0 |
7397 | harvard | Ivy | 70 | 5 | 20 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1947-48 | William Barclay (5-20) | 1947.0 |
7403 | harvard | Ivy | 76 | 8 | 16 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1941-42 | Earl Brown (8-16) | 1941.0 |
7409 | harvard | Ivy | 82 | 7 | 15 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1935-36 | Wes Fesler (7-15) | 1935.0 |
7411 | harvard | Ivy | 84 | 3 | 19 | NaN | NaN | NaN | NaN | NaN | 30 | 30 | 30 | NaN | NaN | 0 | 1933-34 | Wes Fesler (3-19) | 1933.0 |
ncaa.columns
Index(['school', 'conf', 'rk', 'w', 'l', 'srs', 'sos', 'pts_for', 'pts_vs',
'pts_total', 'ap_pre', 'ap_high', 'ap_final', 'pts_diff', 'ncaa_result',
'ncaa_numeric', 'season', 'coaches', 'year'],
dtype='object')
Combine multiple Boolean expressions using logical operators, like with conditionals, BUT unfortunately with diff. syntax.
and
: &
or
: |
# find all of the seasons where an ACC school had a winning record
# all losing seasons for coach K
Many of the basic Boolean operators apply here, like >
and ==
(see here for review of Boolean expressions)
But in Pandas we also have access to Boolean “methods” for strings, like .contains()
or .startswith()
. It works like this:
courses[courses['Title'].str.contains("Design")] # get all the rows where the value of the Title column contains the word Design
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 7.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | None | 3.0 | 7.0 |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 | 3.0 |
# get all the courses that are INST 300-level courses
courses[courses['Code'].str.startswith("INST3")]
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 | 4.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 | 5.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 7.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 | 6.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 | 5.0 |
# get all the courses that have programming in their course description?
courses[courses['Description'].str.contains("programming")]
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 3.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | None | 3.0 | 3.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 | 3.0 |
print(courses[courses['Code'] == "INST794"]['Description'])
40 Through a supervised project, to synthesize de...
Name: Description, dtype: object
# get all the courses that have a "minimum grade" prereq
Reshaping#
Most basic is sorting.
More advanced stuff like transposing and so on we will discuss next week.
courses.sort_values(by="Code", ascending=False)
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 | 3.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 | 5.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 | 7.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 | 5.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 | 7.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 | 3.0 |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 | 3.0 |
30 | INST728V | Special Topics in Information Studies; Digital... | NaN | NaN | NaN | |
29 | INST728G | Special Topics in Information Studies; Smart C... | NaN | NaN | NaN | |
28 | INST709 | Independent Study | NaN | NaN | NaN | |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | None | 3.0 | 7.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | None | 3.0 | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | None | 3.0 | 8.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | None | 3.0 | 3.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | None | 3.0 | 5.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | None | 3.0 | 5.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | None | 3.0 | 6.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | None | 3.0 | 8.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 | 5.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 | 5.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 | 5.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 | 7.0 |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 | 6.0 |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | NaN | |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | NaN | |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 | 5.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 | 6.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 7.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 | 5.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 | 4.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | None | 3.0 | 4.0 |
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 3.0 |
courses
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 3.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | None | 3.0 | 4.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 | 4.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 | 5.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 7.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 3.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 | 6.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 | 5.0 |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | NaN | |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | NaN | |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 | 6.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 | 7.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 | 5.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 | 5.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 | 5.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | None | 3.0 | 8.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | None | 3.0 | 6.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | None | 3.0 | 5.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | None | 3.0 | 5.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | None | 3.0 | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | None | 3.0 | 8.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | None | 3.0 | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | None | 3.0 | 7.0 |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 | 3.0 |
28 | INST709 | Independent Study | NaN | NaN | NaN | |
29 | INST728G | Special Topics in Information Studies; Smart C... | NaN | NaN | NaN | |
30 | INST728V | Special Topics in Information Studies; Digital... | NaN | NaN | NaN | |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 | 3.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 | 7.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 | 3.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 | 5.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 | 7.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 | 5.0 |
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 | 3.0 |
# sort the dataframe, and make sure the mod changes the df itself
courses.sort_values(by="Code", ascending=False, inplace=True) # sort in ascending order by the random_number column
# sort the dataframe, and save the resulting copy in another variable
courses = courses.sort_values(by="Code", ascending=False)
courses
Code | Title | Description | Prereqs | Credits | random_number | |
---|---|---|---|---|---|---|
40 | INST794 | Capstone in Youth Experience | Through a supervised project, to synthesize de... | INST650, INST651, and INST652; or permission o... | 3.0 | 3.0 |
39 | INST785 | Documentation, Collection, and Appraisal of Re... | Development of documentation strategies and pl... | INST604; or permission of instructor. | 3.0 | 5.0 |
38 | INST776 | HCIM CAPSTONE PROJECT | The opportunity to apply the skills learned th... | INST775; or permission of instructor. | 3.0 | 7.0 |
37 | INST767 | Big Data Infrastructure | Principles and techniques of data science and ... | INST737; or permission of instructor. | 3.0 | 5.0 |
36 | INST762 | Visual Analytics | Visual analytics is the use of interactive vis... | INFM603 or INST630; or permission of instructor. | 3.0 | 3.0 |
35 | INST746 | Digitization of Legacy Holdings | Through hands on exercises and real-world proj... | INST604. | 3.0 | 7.0 |
34 | INST742 | Implementing Digital Curation | Management of and technology for application o... | INST604; or permission of instructor. | 3.0 | 3.0 |
33 | INST741 | Social Computing Technologies and Applications | Tools and techniques for developing and config... | INFM603 and INFM605; or (LBSC602 and LBSC671);... | 3.0 | 3.0 |
32 | INST737 | Introduction to Data Science | An exploration of some of the best and most ge... | INST627; and (LBSC690, LBSC671, or INFM603). O... | 3.0 | 3.0 |
31 | INST733 | Database Design | Principles of user-oriented database design. ... | LBSC690, LBSC671, or INFM603; or permission of... | 3.0 | 3.0 |
30 | INST728V | Special Topics in Information Studies; Digital... | NaN | NaN | NaN | |
29 | INST728G | Special Topics in Information Studies; Smart C... | NaN | NaN | NaN | |
28 | INST709 | Independent Study | NaN | NaN | NaN | |
27 | INST702 | Advanced Usability Testing | Usability testing methods -- how to design and... | Permission of instructor; or (INFM605 or INST6... | 3.0 | 3.0 |
26 | INST652 | Design Thinking and Youth | Methods of design thinking specifically within... | None | 3.0 | 7.0 |
25 | INST630 | Introduction to Programming for the Informatio... | An introduction to computer programming intend... | None | 3.0 | 3.0 |
24 | INST627 | Data Analytics for Information Professionals | Skills and knowledge needed to craft datasets,... | None | 3.0 | 8.0 |
23 | INST622 | Information and Universal Usability | Information services and technologies to provi... | None | 3.0 | 3.0 |
22 | INST616 | Open Source Intelligence | An introduction to Open Source Intelligence (O... | None | 3.0 | 5.0 |
21 | INST614 | Literacy and Inclusion | The educational and psychological dimensions o... | None | 3.0 | 5.0 |
20 | INST612 | Information Policy | Nature, structure, development and application... | None | 3.0 | 6.0 |
19 | INST604 | Introduction to Archives and Digital Curation | Overview of the principles, practices, and app... | None | 3.0 | 8.0 |
18 | INST490 | Integrated Capstone for Information Science | The capstone provides a platform for Informati... | Minimum grade of C- in INST314, INST335, INST3... | 3.0 | 5.0 |
17 | INST466 | Technology, Culture, and Society | Individual, cultural, and societal outcomes as... | INST201. | 3.0 | 5.0 |
16 | INST462 | Introduction to Data Visualization | Exploration of the theories, methods, and tech... | INST314. | 3.0 | 5.0 |
15 | INST447 | Data Sources and Manipulation | Examines approaches to locating, acquiring, ma... | INST326 or CMSC131; and INST327. | 3.0 | 7.0 |
14 | INST414 | Data Science Techniques | An exploration of how to extract insights from... | INST314. | 3.0 | 6.0 |
13 | INST408Z | Special Topics in Information Science; The Apo... | NaN | NaN | NaN | |
12 | INST408Y | Special Topics in Information Science; Privacy... | NaN | NaN | NaN | |
11 | INST377 | Dynamic Web Applications | An exploration of the basic methods and tools ... | INST327. | 3.0 | 5.0 |
10 | INST362 | User-Centered Design | Introduction to human-computer interaction (HC... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
9 | INST354 | Decision-Making for Information Science | Examines the use of information in organizatio... | INST314. | 3.0 | 6.0 |
8 | INST352 | Information User Needs and Assessment | Focuses on use of information by individuals, ... | 1 course with a minimum grade of C- from (INST... | 3.0 | 5.0 |
7 | INST346 | Technologies Infrastructure and Architecture | Examines the basic concepts of local and wide-... | 1 course with a minimum grade of C- from (INST... | 3.0 | 6.0 |
6 | INST335 | Teams and Organizations | Team development and the principles, methods a... | 1 course with a minimum grade of C- from (INST... | 3.0 | 3.0 |
5 | INST327 | Database Design and Modeling | Introduction to databases, the relational mode... | 1 course with a minimum grade of C- from (CMSC... | 3.0 | 7.0 |
4 | INST326 | Object-Oriented Programming for Information Sc... | An introduction to programming, emphasizing un... | 1 course with a minimum grade of C- from (INST... | 3.0 | 7.0 |
3 | INST314 | Statistics for Information Science | Basic concepts in statistics including measure... | Must have completed or be concurrently enrolle... | 3.0 | 5.0 |
2 | INST311 | Information Organization | Examines the theories, concepts, and principle... | Must have completed or be concurrently enrolle... | 3.0 | 4.0 |
1 | INST201 | Introduction to Information Science | Examining the effects of new information techn... | None | 3.0 | 4.0 |
0 | INST126 | Introduction to Programming for Information Sc... | An introduction to computer programming for st... | Minimum grade of C- in MATH115; or must have m... | 3.0 | 3.0 |
# if you modify in place and store result, it will be None
sorted_df = df.sort_values(by="Code", ascending=False, inplace=True)
print(type(sorted_df))
# sort by the code column, in ascending order
# sort by the code column, in ascending order
# sort by prereqs
# sort by prereqs, then random
Aside: dataframes are (mostly) immutable#
Python wants you to treat dataframes as immutable: by default, any modifications you make to a dataframe will create a modified copy (just like a string), rather than modifying the dataframe itself.
This means you’ll get the same error as with strings, in that your modifications won’t stick around if you don’t save the resulting copy in a variable.
Like this:
You can get around this if you want, by passing in a inplace=True
argument to most function calls.
But most of the time you will treat them like strings and make sure you save the result of a modification into a variable.