9: Files#

What are files?#

From Python’s perspective, files are data that is outside of the program’s main memory. The PY4E textbook calls it “secondary memory”.

Secondary memory is essential, because main memory, which holds all the data you create while your Python program is running, goes away once the program stops.

Secondary memory is a place to have data that is persistent. Sort of like long-term memory in humans.

I like this picture from the PY4E textbook to illustrate the point:

The middle box is where the Python program “lives”.

Sometimes Python needs a way to connect to the outside world: “input and output devices” on the left, “network” (e.g., the internet!) on the right, and “secondary memory” (e.g., the hard drive! files!) on the right.

So far our programs have been self-contained (except for talking to the outside world via input() and print()).

Now we will talk about how to access and write to files in secondary memory so our data can persist beyond a single Python session/program, and also access data that is… more than we can just write into a single Python file or Jupyter cell.

The open() function, and the file handle object#

Python interacts with files using the file handle object. The open() function, as you might suspect, opens a file handle for a file.

Here is an example. What do you think is in fhand?

1fhand = open('assets/mbox-email-receipts.txt', 'r')
2print(fhand)
<_io.TextIOWrapper name='assets/mbox-email-receipts.txt' mode='r' encoding='UTF-8'>

The main parameters of .open() are:

  1. Path: The path to the file you want to connect to

  2. Mode: A specification of how you want to connect (to read, to write, etc.).

But its return value is not the contents of the file! Instead, its output is a file handle object: io.TextIOWrapper.

I like this picture from the PY4E textbook:

What kind of thing is it? What does it allow us to do?

Just like lists have methods like .append(), strings have methods like .upper() and .split(), and dictionaries have methods like .update() and .get(), file handle objects have key methods that enable us to work with the actual file:

  • read the contents of the file (with .read() or .readlines())

  • write to the file (with .write() or .writelines())

So, in the example above, I’ve opened a file handle for the mbox-email-receipts.txt file, in the reading mode (r), which enables me to use .read() to read the contents of the file.

We’ll return to the concepts of mode and operations in a bit. First, we need to understand how to direct Python to a file so we can actually open a file handle to it!

File paths#

File paths are a way of giving Python directions to the file’s location.

In the example above, the file path was 'assets/mbox-email-receipts.txt'

There are two parts to a file path:

  1. The filename itself.

  2. The path/directions to the folder it’s in “from where you are”

The filename is obvious, but the path/directions part is not. So let’s take a closer look.

Path, aka directions to a folder#

Think of a file path like giving directions to a room in a building. To write a path, you need to know:

  1. Where you are (your program’s current working directory)

  2. Where the file is (the target folder)

  3. How to get there (the path connecting them)

The building blocks of a path are:

  • Folder names — the names of directories you need to go through

  • / — a separator between folder names (and between the last folder and the filename). Think of it as “go into…”

  • .. — means “go up one level” (to the parent folder). You only need this when the file is outside of your current directory — for example, in a sibling folder or a parent folder.

For example, in our earlier path 'assets/mbox-email-receipts.txt':

  • assets is the folder name

  • / separates the folder from the filename

  • mbox-email-receipts.txt is the file

This path works because the assets folder is inside our current working directory. We can verify this using the os library:

1import os # get all the data and functions from the os library
2
3cwd = os.getcwd() # show me where I am on the hard drive
4current_view = os.listdir() # list all the names of things I can immediately see in my current location
5
6print(f"You are currently in {cwd}\n")
7print(f"Here are all the things you can see:")
8for thingname in current_view:
9    print(thingname)
You are currently in /home/runner/work/inst126-intro-programming-notes/inst126-intro-programming-notes

Here are all the things you can see:
.DS_Store
Practice - Defining Functions.ipynb
6_Iteration.ipynb.bak
README.md
Practice_Module-2_Lists.ipynb
Practice - Module 2 Review.ipynb
myfile.txt
.ipynb_checkpoints
.github
2b_Variables.md
Practice_Module-1-Projects.ipynb
Problem-Formulation.md
assets
5_lists.md
9_Files.ipynb.bak
2a_Expressions.md
4_Conditionals.md
requirements.txt
10_Pandas-1.ipynb
Practice_Module-2-Projects.md
Help-Seeking-Template.md
intro.md
_config.yml
laptop-weights-by-company.csv
Practice_Dictionaries_Scenarios.md
what-is-programming.md
Practice_Module-2-Projects.ipynb.bak
.jupyter
LICENSE
complex_dictionary.json
Practice_Strings_Integrative.md
Practice_Conditionals.ipynb
marvel-movies.csv
7_Strings.md
Practice_Debugging_examples-Solutions.ipynb
Practice_Warmup_Tracing.md
laptop-report.txt
.git
9_Files.md
11_Pandas-2.ipynb
_toc.yml
4_Conditionals.qmd
Practice - Defining Functions (Errors).ipynb
.gitignore
dictionary.json
Practice_Debugging_examples.ipynb
module-4-review-scratchpad.ipynb
3_Functions.md
_static
ncaa-team-data.csv
6_Iteration.md
8_Dictionaries.ipynb.bak
exam_draft_module2.md
slides_7_Strings.html
8_Dictionaries.md
Practice_Indexing_FITB.md
notes.md
Debugging-Helpseeking.md
7_Strings.ipynb.bak
INST courses.csv
_build
dictionary.txt
data
example-course-copier.ipynb

When would you need ..?#

Imagine your files are organized like this:

my_project/
├── code/
│   └── analysis.py      ← your program is here
├── data/
│   └── results.csv
└── README.md

If your program analysis.py is running inside the code/ folder, the file results.csv is not in a subfolder — it’s in a sibling folder called data/. To reach it, you need to go up one level first (from code/ to my_project/), then down into data/:

# from inside code/, go up one level (..), then into data/
fpath = "../data/results.csv"

You can read .. as “go to the parent folder.” Here’s how the path breaks down:

  • .. — go up from code/ to my_project/

  • /data — go into the data/ folder

  • /results.csv — that’s the file

If you wanted to reach README.md (which is one level up, not in any subfolder):

# just go up one level
fpath = "../README.md"

You can even chain .. to go up multiple levels: ../../some_file.txt goes up two levels. But if you find yourself doing that, it’s usually a sign to reorganize your files!

Practice: file paths#

Use the following folder structure for all questions:

school/
├── projects/
│   ├── project1/
│   │   ├── code/
│   │   │   └── analysis.py
│   │   └── data/
│   │       └── survey.csv
│   └── project2/
│       └── main.py
├── notes/
│   └── lecture1.txt
└── grades.csv

P1. Your program is analysis.py (inside school/projects/project1/code/). Write the relative path to open survey.csv.

1# your code here
2# fpath = ???

Answer:

fpath = "../data/survey.csv"

From code/, you go up one level (..) to project1/, then into data/, then the file.

P2. Your program is analysis.py. Write the relative path to open grades.csv.

1# your code here
2# fpath = ???

Answer:

fpath = "../../../grades.csv"

From code/, go up to project1/ (..), up to projects/ (../..), up to school/ (../../..), and there’s grades.csv.

P3. Your program is main.py (inside school/projects/project2/). Write the relative path to open lecture1.txt.

1# your code here
2# fpath = ???

Answer:

fpath = "../../notes/lecture1.txt"

From project2/, go up to projects/ (..), up to school/ (../..), then into notes/, then the file.

P4. Your program is main.py. Write the relative path to open survey.csv (in project1’s data folder).

1# your code here
2# fpath = ???

Answer:

fpath = "../project1/data/survey.csv"

From project2/, go up to projects/ (..), then into project1/, then data/, then the file.

P5. Your program is running from the school/ folder itself. Write the relative paths to open: (a) survey.csv, (b) lecture1.txt, and (c) grades.csv.

1# your code here
2# fpath_survey = ???
3# fpath_lecture = ???
4# fpath_grades = ???

Answer:

fpath_survey = "projects/project1/data/survey.csv"
fpath_lecture = "notes/lecture1.txt"
fpath_grades = "grades.csv"

When you’re already at school/, everything is below you — no .. needed! Just go down into the right subfolders. And grades.csv is right here, so it’s just the filename.

Relative vs. absolute file paths#

So far we have been discussing relative file paths: paths that describe how to locate a file relative to your program’s current working directory.

It is also possible to specify absolute file paths: the full location of a file from the root of your filesystem, like /Users/joel/Documents/data.csv on Mac or C:\Users\joel\Documents\data.csv on Windows.

Absolute paths are almost never a good idea for code you plan to share or submit. If you use an absolute path, your program will break on anyone else’s computer, because their filesystem will have different usernames, folder structures, and locations.

For this reason, in this class, we want you to practice writing relative file paths for all of your programs that deal with files.

Working with files#

The second parameter of open() specifies the mode — what you intend to do with the file. Think of it as a permission system: you can only do operations that match the mode you opened with. Let’s look at each mode together with the operations it enables.

Reading files (mode 'r')#

To read a file, open it with 'r' (or leave the mode out entirely — 'r' is the default).

1path = "assets/"
2fname = "mbox-email-receipts.txt"
3fpath = f"{path}{fname}"
4
5# open in read mode
6fhand = open(fpath, 'r')

Once you have a file handle open for reading, there are two main ways to get the contents:

.read() reads the whole file as a single string:

1fhand = open(fpath, 'r')
2content_s = fhand.read()
3content_s
'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\nFrom louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\nFrom zqian@umich.edu Fri Jan  4 16:10:39 2008\nFrom rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\nFrom zqian@umich.edu Fri Jan  4 15:03:18 2008\nFrom rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\nFrom cwen@iupui.edu Fri Jan  4 11:37:30 2008\nFrom cwen@iupui.edu Fri Jan  4 11:35:08 2008\nFrom gsilver@umich.edu Fri Jan  4 11:12:37 2008\nFrom gsilver@umich.edu Fri Jan  4 11:11:52 2008\nFrom zqian@umich.edu Fri Jan  4 11:11:03 2008\nFrom gsilver@umich.edu Fri Jan  4 11:10:22 2008\nFrom wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\nFrom zqian@umich.edu Fri Jan  4 10:17:43 2008\nFrom antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\nFrom gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\nFrom stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\nFrom louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\nFrom louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\nFrom ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\nFrom cwen@iupui.edu Thu Jan  3 16:34:40 2008\nFrom cwen@iupui.edu Thu Jan  3 16:29:07 2008\nFrom cwen@iupui.edu Thu Jan  3 16:23:48 2008\n'

.readlines() reads the whole file as a list of strings (one per line):

1fhand = open(fpath, 'r')
2content_list = fhand.readlines()
3content_list
['From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\n',
 'From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\n',
 'From zqian@umich.edu Fri Jan  4 16:10:39 2008\n',
 'From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\n',
 'From zqian@umich.edu Fri Jan  4 15:03:18 2008\n',
 'From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\n',
 'From cwen@iupui.edu Fri Jan  4 11:37:30 2008\n',
 'From cwen@iupui.edu Fri Jan  4 11:35:08 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:12:37 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:11:52 2008\n',
 'From zqian@umich.edu Fri Jan  4 11:11:03 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:10:22 2008\n',
 'From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\n',
 'From zqian@umich.edu Fri Jan  4 10:17:43 2008\n',
 'From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\n',
 'From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\n',
 'From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\n',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008\n']

In both cases, you end up with strings. You can then parse them to do what you want — for example, splitting the single string on "\n" to get individual lines:

1content_s.split("\n")
['From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008',
 'From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008',
 'From zqian@umich.edu Fri Jan  4 16:10:39 2008',
 'From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008',
 'From zqian@umich.edu Fri Jan  4 15:03:18 2008',
 'From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008',
 'From cwen@iupui.edu Fri Jan  4 11:37:30 2008',
 'From cwen@iupui.edu Fri Jan  4 11:35:08 2008',
 'From gsilver@umich.edu Fri Jan  4 11:12:37 2008',
 'From gsilver@umich.edu Fri Jan  4 11:11:52 2008',
 'From zqian@umich.edu Fri Jan  4 11:11:03 2008',
 'From gsilver@umich.edu Fri Jan  4 11:10:22 2008',
 'From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008',
 'From zqian@umich.edu Fri Jan  4 10:17:43 2008',
 'From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008',
 'From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008',
 'From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008',
 'From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008',
 '']

Iterating through a file line by line#

You can also loop directly over a file handle, which gives you one line at a time — just like iterating over a list. This is handy when you want to process each line without loading the entire file into memory.

1thursday_records = []
2fhand = open(fpath, 'r')
3for line in fhand:
4    if 'Thu' in line:
5        thursday_records.append(line)
6
7thursday_records
['From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\n',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008\n']

Chaining open + read#

Since open() returns a file object, you can chain .read() or .readlines() directly onto it — no need for an intermediate variable:

1fstring = open("assets/newfile2.txt", "r").read()
2fstring
'hello world! \n\nThis is a file with a lot of stuff in it.\n\n'

This is the same chaining concept from strings: each expression produces a result, and you can immediately call a method on that result. open(...) produces a file object → .read() is called on it → returns a string.

In the next module we will learn how the pandas library connects to files to cover common parsing situations (e.g., I have a spreadsheet, I want to go straight into a dataframe for analysis). More on that later! The concepts of accessing files will still apply.

Writing files (mode 'w')#

To write to a file, open it with 'w'. This gives you access to the .write() method — think of it as similar to print(), except it writes to a file instead of the screen.

1path = "assets/"
2fname = "newfile.txt"
3fpath = f"{path}{fname}"
4
5fhand = open(fpath, 'w')
6fhand.write("Hello world, my programming friends!")
7fhand.close()

A few important things to know about 'w' mode:

It creates the file if it doesn’t exist. You don’t need to create the file beforehand — Python will make it for you.

1path = "assets/"
2fname = "newfile-from-class.txt"
3fpath = f"{path}{fname}"
4
5fhand = open(fpath, 'w')
6fhand.write("This is a new file from class!")
7fhand.close()

It overwrites the file if it already exists! Be careful — opening in 'w' mode erases the previous contents.

1path = "assets/"
2fname = "newfile.txt"
3fpath = f"{path}{fname}"
4
5fhand = open(fpath, 'w')
6fhand.write("Hello world from INST126!")
7fhand.close()

You can write multiple times before closing:

1path = "assets/"
2fname = "newfile5.txt"
3fpath = f"{path}{fname}"
4
5fhand = open(fpath, 'w')
6fhand.write("Hello INST126!")
7fhand.write("\n\nAnother line")
8fhand.close()

If you opened in read mode, Python won’t let you write — this is good for security!

1fhand = open("assets/newfile.txt", 'r')
2fhand.write("This will fail!")
3fhand.close()
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
Cell In[18], line 2
      1 fhand = open("assets/newfile.txt", 'r')
----> 2 fhand.write("This will fail!")
      3 fhand.close()

UnsupportedOperation: not writable

Closing files and the with pattern#

You may be told that you need to .close() a file to safely exit the connection. As best we can tell, this used to be very important, but in Python 3 we haven’t been able to find concrete, repeatable consequences of forgetting it in this course. However, it can matter in professional settings: https://realpython.com/why-close-file-python/

To avoid worrying about it, you can use the with pattern, which automatically closes the file when the block finishes:

1path = "assets/"
2fname = "newfile.txt"
3fpath = f"{path}{fname}"
4
5with open(fpath, 'w') as fhand:
6    fhand.write("Hello world! Something new")
7# file is automatically closed here

The with pattern works for reading too:

1with open("assets/newfile2.txt", 'r') as fhand:
2    content = fhand.read()
3print(content)
hello world! 

This is a file with a lot of stuff in it.

Appending to files (mode 'a')#

Append mode is a variant of write mode: it lets you add content to the end of a file without erasing what’s already there.

1path = "assets/"
2fname = "newfile.txt"
3fpath = f"{path}{fname}"
4
5fhand = open(fpath, 'a')
6fhand.write("\nMore stuff from INST126!")
7fhand.close()

This is useful when you want to build up a file over time — like a log file.

Summary of modes#

Mode

What it does

Creates file?

Overwrites?

'r'

Read only (default)

No — error if missing

No

'w'

Write (from scratch)

Yes

Yes!

'a'

Append (add to end)

Yes

No

There are more advanced modes (e.g., 'rb' for reading binary files), but 'r', 'w', and 'a' cover most of what you’ll need.

Common errors with files#

Can’t find the file: FileNotFoundError#

1fhand = open("mbox-email-receipts.txt", 'r')
2print(fhand.read())
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
Cell In[22], line 1
----> 1 fhand = open("mbox-email-receipts.txt", 'r')
      2 print(fhand.read())

File /opt/hostedtoolcache/Python/3.11.15/x64/lib/python3.11/site-packages/IPython/core/interactiveshell.py:344, in _modified_open(file, *args, **kwargs)
    337 if file in {0, 1, 2}:
    338     raise ValueError(
    339         f"IPython won't let you open fd={file} by default "
    340         "as it is likely to crash IPython. If you know what you are doing, "
    341         "you can use builtins' open."
    342     )
--> 344 return io_open(file, *args, **kwargs)

FileNotFoundError: [Errno 2] No such file or directory: 'mbox-email-receipts.txt'

In basically all cases, the issue/mismatch is between your understanding of where the thing is (path) or what its name is (fname) and what you actually told Python.

This could be a:

  • Misspelling (remember how literal Python is?)

  • Wrong/missing directions (e.g., missing a folder, or an operation)

Wrong connection type/permission: UnsupportedOperation#

1# opened in read mode, and reading works fine
2fhand = open("assets/mbox-email-receipts.txt", 'r')
3print(fhand.read())
4fhand.close()
From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008
From zqian@umich.edu Fri Jan  4 16:10:39 2008
From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008
From zqian@umich.edu Fri Jan  4 15:03:18 2008
From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008
From cwen@iupui.edu Fri Jan  4 11:37:30 2008
From cwen@iupui.edu Fri Jan  4 11:35:08 2008
From gsilver@umich.edu Fri Jan  4 11:12:37 2008
From gsilver@umich.edu Fri Jan  4 11:11:52 2008
From zqian@umich.edu Fri Jan  4 11:11:03 2008
From gsilver@umich.edu Fri Jan  4 11:10:22 2008
From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008
From zqian@umich.edu Fri Jan  4 10:17:43 2008
From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008
From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008
From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008
From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008
From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008
From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008
From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008
From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008
From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008
From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008
From cwen@iupui.edu Thu Jan  3 16:34:40 2008
From cwen@iupui.edu Thu Jan  3 16:29:07 2008
From cwen@iupui.edu Thu Jan  3 16:23:48 2008
1# opened in read mode
2# but tried to write to it
3fhand = open("assets/mbox-email-receipts.txt", 'r')
4print(fhand.write("Hello world"))
---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
Cell In[24], line 4
      1 # opened in read mode
      2 # but tried to write to it
      3 fhand = open("assets/mbox-email-receipts.txt", 'r')
----> 4 print(fhand.write("Hello world"))

UnsupportedOperation: not writable
1import os
2os.listdir()
['.DS_Store',
 'Practice - Defining Functions.ipynb',
 '6_Iteration.ipynb.bak',
 'README.md',
 'Practice_Module-2_Lists.ipynb',
 'Practice - Module 2 Review.ipynb',
 'myfile.txt',
 '.ipynb_checkpoints',
 '.github',
 '2b_Variables.md',
 'Practice_Module-1-Projects.ipynb',
 'Problem-Formulation.md',
 'assets',
 '5_lists.md',
 '9_Files.ipynb.bak',
 '2a_Expressions.md',
 '4_Conditionals.md',
 'requirements.txt',
 '10_Pandas-1.ipynb',
 'Practice_Module-2-Projects.md',
 'Help-Seeking-Template.md',
 'intro.md',
 '_config.yml',
 'laptop-weights-by-company.csv',
 'Practice_Dictionaries_Scenarios.md',
 'what-is-programming.md',
 'Practice_Module-2-Projects.ipynb.bak',
 '.jupyter',
 'LICENSE',
 'complex_dictionary.json',
 'Practice_Strings_Integrative.md',
 'Practice_Conditionals.ipynb',
 'marvel-movies.csv',
 '7_Strings.md',
 'Practice_Debugging_examples-Solutions.ipynb',
 'Practice_Warmup_Tracing.md',
 'laptop-report.txt',
 '.git',
 '9_Files.md',
 '11_Pandas-2.ipynb',
 '_toc.yml',
 '4_Conditionals.qmd',
 'Practice - Defining Functions (Errors).ipynb',
 '.gitignore',
 'dictionary.json',
 'Practice_Debugging_examples.ipynb',
 'module-4-review-scratchpad.ipynb',
 '3_Functions.md',
 '_static',
 'ncaa-team-data.csv',
 '6_Iteration.md',
 '8_Dictionaries.ipynb.bak',
 'exam_draft_module2.md',
 'slides_7_Strings.html',
 '8_Dictionaries.md',
 'Practice_Indexing_FITB.md',
 'notes.md',
 'Debugging-Helpseeking.md',
 '7_Strings.ipynb.bak',
 'INST courses.csv',
 '_build',
 'dictionary.txt',
 'data',
 'example-course-copier.ipynb']

Writing different kinds of things to files#

Often we just pass strings directly into .write() to write data to a file. But sometimes we want to write specific kinds of data structures to a file and preserve its structure in some way. One example is saving a dictionary to a file.

We could write the dictionary to the file like this:

1d = {"a": 1, "b": 2, "c": 3}
2with open("dictionary.txt", "w") as f:
3    f.write(str(d)) # write the string representation of the dictionary to the file

But sometimes you want to write the dictionary in a more structured way, like JSON (which stands for JavaScript Object Notation; a standard way to represent data structures in string form to pass data between programs in a way that makes it easy to export/import with consistent structure and make parsing easy, often on the Internet).

To do this, we can use the json library to neatly package up a dictionary to be able to write it to a file and have confidence that it retains its essential structure and can easily be read back into a dictionary from a file.

The code for doing so might look like this:

1import json
2# write the dictionary to a file in JSON format
3with open("dictionary.json", "w") as f:
4    str_d = json.dumps(d, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces
5    f.write(str_d) # write the JSON string to the file

Then the contents of the file will look like this:

{
    "a": 1,
    "b": 2,
    "c": 3
}

This is nice and easy to read for humans, and especially pays off for longer and more complex dictionaries. For instance:

 1complex_d = {
 2    "a": 1,
 3    "b": 2,
 4    "c": {
 5        "d": 3,
 6        "e": 4
 7    },
 8    "f": [5, 6, 7],
 9    "g": {
10        "h": 8,
11        "i": {
12            "j": 9,
13            "k": 10
14        }
15    }
16}
17
18with open("complex_dictionary.json", "w") as f:
19    str_d = json.dumps(complex_d, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces
20    f.write(str_d) # write the JSON string to the file

The .json file contents will look like this:

{
    "a": 1,
    "b": 2,
    "c": {
        "d": 3,
        "e": 4
    },
    "f": [
        5,
        6,
        7
    ],
    "g": {
        "h": 8,
        "i": {
            "j": 9,
            "k": 10
        }
    }
}

Instead of

{'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, 'f': [5, 6, 7], 'g': {'h': 8, 'i': {'j': 9, 'k': 10}}}

As a bonus, you can use the json.loads() function to read the contents of a .json file directly into a dictionary in a Python program, like this:

1with open("dictionary.json", "r") as f:
2    str_d = f.read() # read the contents of the file into a string
3    d = json.loads(str_d) # convert the JSON string back to a dictionary
4    print(d)
{'a': 1, 'b': 2, 'c': 3}

The dump() and load() functions from the json library make this work even simpler!

1# write the dictionary to a file in JSON format
2with open("dictionary.json", "w") as f:
3    json.dump(d, f, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces, and write to f
1with open("dictionary.json", "r") as f:
2    d = json.load(f) # read the contents of the JSON file into a dictionary
3    print(d)
{'a': 1, 'b': 2, 'c': 3}

Aside: What is a library?#

You can think of a library (also called a module) as a collection of functions and data structures. You import a library (or subsets of it) into your program so you have access to special functions or data structures.

You are already using Python’s standard library, which includes built-in functions like print(), and built-in data structures like str and dict. Every time you fire up Python, these are “imported” into your program in the background.

As you advance in your programming career, you will often find that you want to solve some (sub)problems that others have tried to do, and wrote a collection of functions and/or data structures to solve those problems really well, and saved that collection into a library that others can use. Take advantage of this!

For instance, os is a library that has convenient methods for handling files:

 1# import the os library
 2import os
 3# see the current working directory
 4print(os.getcwd())
 5# list the contents of the current working directory
 6print(os.listdir())
 7# check if a file exists
 8print(os.path.exists("assets/mbox-email-receipts.txt"))
 9# make a file path using os.path.join
10path = os.path.join("assets", "mbox-email-receipts.txt")
11print(path)
/home/runner/work/inst126-intro-programming-notes/inst126-intro-programming-notes
['.DS_Store', 'Practice - Defining Functions.ipynb', '6_Iteration.ipynb.bak', 'README.md', 'Practice_Module-2_Lists.ipynb', 'Practice - Module 2 Review.ipynb', 'myfile.txt', '.ipynb_checkpoints', '.github', '2b_Variables.md', 'Practice_Module-1-Projects.ipynb', 'Problem-Formulation.md', 'assets', '5_lists.md', '9_Files.ipynb.bak', '2a_Expressions.md', '4_Conditionals.md', 'requirements.txt', '10_Pandas-1.ipynb', 'Practice_Module-2-Projects.md', 'Help-Seeking-Template.md', 'intro.md', '_config.yml', 'laptop-weights-by-company.csv', 'Practice_Dictionaries_Scenarios.md', 'what-is-programming.md', 'Practice_Module-2-Projects.ipynb.bak', '.jupyter', 'LICENSE', 'complex_dictionary.json', 'Practice_Strings_Integrative.md', 'Practice_Conditionals.ipynb', 'marvel-movies.csv', '7_Strings.md', 'Practice_Debugging_examples-Solutions.ipynb', 'Practice_Warmup_Tracing.md', 'laptop-report.txt', '.git', '9_Files.md', '11_Pandas-2.ipynb', '_toc.yml', '4_Conditionals.qmd', 'Practice - Defining Functions (Errors).ipynb', '.gitignore', 'dictionary.json', 'Practice_Debugging_examples.ipynb', 'module-4-review-scratchpad.ipynb', '3_Functions.md', '_static', 'ncaa-team-data.csv', '6_Iteration.md', '8_Dictionaries.ipynb.bak', 'exam_draft_module2.md', 'slides_7_Strings.html', '8_Dictionaries.md', 'Practice_Indexing_FITB.md', 'notes.md', 'Debugging-Helpseeking.md', '7_Strings.ipynb.bak', 'INST courses.csv', '_build', 'dictionary.txt', 'data', 'example-course-copier.ipynb']
True
assets/mbox-email-receipts.txt

The import keyword followed by the name of the library makes the functions and data structures in the library available to your running Python interpreter. This is analogous to how def makes a function available for your program to call/use.

Once you import a library, you can access the functions in that library by first declaring the name of the library (e.g., json), then a ., then the name of the method (e.g., dumps()): notice that the syntax is similar to calling methods for data structures (e.g., some_list.append()).