9: Files

9: Files#

What are files?#

From Python’s perspctive, files are data that is outside of the program’s main memory. The PY4E textbook calls it “secondary memory”.

Secondary memory is essential, because main memory, which holds all the data you create while your Python program is running, goes away once the program stops.

Secondary memory is a place to have data that is persistent. Sort of like long-term memory in humans.

I like this picture from the PY4E textbook to illustrate the point:

The middle box is where the Python program “lives”.

Sometimes Python needs a way to connect to the outside world: “input and output devices” on the left, “network” (e.g., the internet!) on the right, and “secondary memory” (e.g., the hard drive! files!) on the right.

So far our programs have been self-contained (except for talking to the outside world via input() and print().

Now we will talk about how to access and write to files in secondary memory so our data can persist beyond a single Python session/program, and also access data that is… more than we can just write into a single Python file or Jupyter cell.

The `open()` function, and the file handle object#

Python interacts with files using the file handle object. The open() function, as you might suspect, opens a file handle for a file.

Here is an example. What do you think is in fhand?

fhand = open('assets/mbox-email-receipts.txt', 'r')
print(fhand)

<_io.TextIOWrapper name='assets/mbox-email-receipts.txt' mode='r' encoding='UTF-8'>

The main parameters of .open() are:

Path: The path to the file you want to connect to
Mode: A specification of how you want to connect (to read, to write, etc.).

But its return value is not the contents of the file! Instead, its output is a file handle object: io.TextIOWrapper.

I like this picture from the PY4E textbook:

What kind of thing is it? What does it allow us to do?

Just like lists have methods like .append(), strings have methods like .upper() and .split(), and dictionaries have methods like .update() and .get(), file handle objects have key methods that enable us to work with the actual file:

read the contents of the file (with .read() or readlines())
write to the file (with .write() or writelines())

So, in the example above, I’ve opened a file handle for the mbox-email-receipts.txt file, in the reading mode (r), which enables me to use .read() to read the contents of the file.

We’ll return to the concepts of mode and operations in a bit. First, we need to understand how to direct python to a file so we can actually open a file handle to it!

File paths#

File paths are a way of giving Python directions to the file’s location.

In the example above, the file path was 'assets/mbox-email-receipts.txt'

There are two parts to a file path:

The filename itself.
The path/directions to the folder it’s in “from where you are”

The filename is obvious, but the path/directions part is not. So let’s take a closer look.

Path, aka directions to a folder#

Like in everyday directions, to write the path (directions to the folder that has the file), you need to know:

Where you are (your current location)
The target location
The “directions” to the target location

The parts of a path are:

Operations: .. for “go up a level”, and / for “open the door / go down to…”
Names of folders (“rooms”)

So you can think of a path as a route/directions from where you are, to the target folder/room that has the file you want Python to act on.

In the example above, we have the target location of the assets folder, and we give directions to Python to “open the door” (/) to the assets folder, and that is where it can find the mbox-email-receipts.txt file.

These directions work because we are currently in a room/folder that has the assets folder in it.

We can use the os library to check/change where we are. This helps understand the concept of paths better, I find. It’s also a bit of a preview of using libraries, which we’ll do more of in Module 4. If you’re confused, don’t worry! You don’t need to use os in this module. This is just for us to see where we are.

import os # get all the data and functions from the os library

cwd = os.getcwd() # show me where i am on the hard drive
current_view = os.listdir() # list all the names of things i can immediately see in my current location

print(f"You are currently in {cwd}\n")
print(f"Here are all the things you can see:")
for thingname in current_view:
    print(thingname)

You are currently in /Users/joelchan/Projects/inst126-intro-programming-notes

Here are all the things you can see:
Problem-Formulation.md
what-is-programming.md
.DS_Store
example-course-copier.ipynb
LICENSE
requirements.txt
Practice_Module-2-Projects.ipynb
ncaa-team-data.csv
Practice_Debugging_examples.ipynb
dictionary.json
intro.md
Practice_Debugging_examples-Solutions.ipynb
2b_Variables.md
5_lists.md
dictionary.txt
_static
6_Iteration.ipynb
Practice_Conditionals.ipynb
laptop-report.txt
Practice_Module-1-Projects.ipynb
marvel-movies.csv
INST courses.csv
4_Conditionals.md
README.md
module-4-review-scratchpad.ipynb
_toc.yml
3_Functions.md
.gitignore
7_Strings.ipynb
_build
_config.yml
Debugging-Helpseeking.md
laptop-weights-by-company.csv
Practice_Module-2_Lists.ipynb
11_Pandas-2.ipynb
myfile.txt
.ipynb_checkpoints
10_Pandas-1.ipynb
.jupyter
Help-Seeking-Template.md
.git
9_Files.ipynb
data
assets
2a_Expressions.md
8_Dictionaries.ipynb
complex_dictionary.json

Practice writing directions (paths) to files!#

Let’s open the newfile2.txt file in the other_stuff folder

fpath = 'other_stuff/newfile2.txt'
fhand = open(fpath, 'r')
fhand

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/var/folders/xz/_hjc5hsx743dclmg8n5678nc0000gn/T/ipykernel_18753/1209499128.py in <module>
      1 fpath = 'other_stuff/newfile2.txt'
----> 2 fhand = open(fpath, 'r')
      3 fhand

FileNotFoundError: [Errno 2] No such file or directory: 'other_stuff/newfile2.txt'

Let’s open the newfile3.txt file in the a_drawer folder

path = 'other_stuff/a_drawer/'
fname = 'newfile3.txt'
fpath = f'{path}{fname}'
fhand = open(fpath, 'r')
fhand

<_io.TextIOWrapper name='other_stuff/a_drawer/newfile3.txt' mode='r' encoding='UTF-8'>

Let’s open the newfile.txt file in the more_stuff folder

fpath = '../more_stuff/newfile.txt'
fhand = open(fpath, 'r')
fhand

<_io.TextIOWrapper name='../more_stuff/newfile.txt' mode='r' encoding='UTF-8'>

Relative vs. absolute file paths#

So far we have been discussing relative file paths: that is, a path that describes how your Python program can locate the target file relative to its current location.

It is also possible to specify absolute file paths: that is, a path that describes the fixed, or “absolute” location of your target file on your filesystem. This is almost never a good idea: a big reason for this is that you often will need to run your program on a different computer (such as your team members’ computers!), and if you use an absolute file path, your Python program will throw a FileNotFound exception, because the other computer almost certainly will not have the exact same file system structure and contents (e.g., the name of the user, the structure of where different folders are, etc.).

For this reason, in this class, we want you to practice writing relative file paths for all of your programs that deal with files.

File handle “mode” (aka access controls/permissions)#

The second parameter of an open() function call specifies what you intend to do with the file.

This part is how the file data structure includes some basic security: you can only write to files you have “write access” to, for example.

Read mode (`r`)#

path = "assets/"
fname = "mbox-email-receipts.txt"
fpath = f"{path}{fname}"

# basic read
# put 'r' as the second argument
fhand = open(fpath, 'r')
fhand.read()

'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\nFrom louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\nFrom zqian@umich.edu Fri Jan  4 16:10:39 2008\nFrom rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\nFrom zqian@umich.edu Fri Jan  4 15:03:18 2008\nFrom rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\nFrom cwen@iupui.edu Fri Jan  4 11:37:30 2008\nFrom cwen@iupui.edu Fri Jan  4 11:35:08 2008\nFrom gsilver@umich.edu Fri Jan  4 11:12:37 2008\nFrom gsilver@umich.edu Fri Jan  4 11:11:52 2008\nFrom zqian@umich.edu Fri Jan  4 11:11:03 2008\nFrom gsilver@umich.edu Fri Jan  4 11:10:22 2008\nFrom wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\nFrom zqian@umich.edu Fri Jan  4 10:17:43 2008\nFrom antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\nFrom gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\nFrom stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\nFrom louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\nFrom louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\nFrom ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\nFrom cwen@iupui.edu Thu Jan  3 16:34:40 2008\nFrom cwen@iupui.edu Thu Jan  3 16:29:07 2008\nFrom cwen@iupui.edu Thu Jan  3 16:23:48 2008\n'

This is the default. If you don’t specify a mode, this is what it defaults to.

path = "assets/"
fname = "mbox-email-receipts.txt"
fpath = f"{path}{fname}"

# basic read
# put 'r' as the second argument
fhand = open(fpath)
fhand.read()

'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\nFrom louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\nFrom zqian@umich.edu Fri Jan  4 16:10:39 2008\nFrom rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\nFrom zqian@umich.edu Fri Jan  4 15:03:18 2008\nFrom rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\nFrom cwen@iupui.edu Fri Jan  4 11:37:30 2008\nFrom cwen@iupui.edu Fri Jan  4 11:35:08 2008\nFrom gsilver@umich.edu Fri Jan  4 11:12:37 2008\nFrom gsilver@umich.edu Fri Jan  4 11:11:52 2008\nFrom zqian@umich.edu Fri Jan  4 11:11:03 2008\nFrom gsilver@umich.edu Fri Jan  4 11:10:22 2008\nFrom wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\nFrom zqian@umich.edu Fri Jan  4 10:17:43 2008\nFrom antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\nFrom gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\nFrom stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\nFrom louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\nFrom louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\nFrom ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\nFrom cwen@iupui.edu Thu Jan  3 16:34:40 2008\nFrom cwen@iupui.edu Thu Jan  3 16:29:07 2008\nFrom cwen@iupui.edu Thu Jan  3 16:23:48 2008\n'

Write mode (`w`)#

path = "assets/"
fname = "newfile.txt"
fpath = f"{path}{fname}"

# basic write
# put 'w' as the seocnd argument
fhand = open(fpath, 'w')
fhand.write("Hello world, my programming friends!")
fhand.close()

if you don’t put ‘w’ as the second argument, python doesn’t know that you want to write, and will block you from doing so. this is good for security!

path = "assets/"
fname = "newfile.txt"
fpath = f"{path}{fname}"

fhand = open(fpath, 'r')
fhand.write("Hello world from INST126 SP23 Week 12!")
fhand.close()

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
/var/folders/xz/_hjc5hsx743dclmg8n5678nc0000gn/T/ipykernel_11301/1000295658.py in <module>
      4 
      5 fhand = open(fpath, 'r')
----> 6 fhand.write("Hello world from INST126 SP23 Week 12!")
      7 fhand.close()

UnsupportedOperation: not writable

w mode also creates a new file if it doesn’t already exist at that path.

be careful though! if you run it again and do .write(), you’ll overwrite the file!

path = "assets/"
fname = "newfile-from-class.txt"
fpath = f"{path}{fname}"

fhand = open(fpath, 'w')
fhand.write("This is a new file from class!")
fhand.close()

# basic write
# if you don't put 'w' as the second arugment, python doesn't know that you want to write, and will block you from doing so
# good for security
path = "assets/"
fname = "newfile.txt"
fpath = f"{path}{fname}"

fhand = open(fpath, 'w')
# f is now a connection to the file that allows you to write to it
fhand.write("Hello world from INST126 SP23 Week 12!")
fhand.close()

Append mode (`a`)#

This is a variant of the write mode: specifically, it allows you to write, but only by adding stuff to the end of the file.

path = "assets/"
fname = "newfile.txt"
fpath = f"{path}{fname}"

# append to a file
fhand = open(fpath, 'a')
# f is now a connection to the file that allows you to write to it
fhand.write("More stuff from INST126 SP23 Week 12!")
fhand.close()

There are more advanced ways to specify how you want to connect (e.g., 'rb' read binary, for when you have weird fileformats). But basic r and w should cover most of your needs for now.

Operations on files#

Ok, assuming we have the right mode, let’s look more closely at how we do things with files.

Reading the contents of a file#

Very often you want to connect to a file because you want to read it. There are two ways to do this:

.read() reads in the whole contents of the file as a string
.readlines() reads in the whole contents of the file as a list of strings

In both cases, you end up with strings. You can then parse it to do what you want with it.

# the path
path = "assets/"
fname = "mbox-email-receipts.txt"
fpath = f"{path}{fname}"

# open the file connection and store in f
fhand = open(fpath, 'r') # open the file connection and store in the variable fhand
content_s = fhand.read() # read the contents of the file, and dump into a string called content_s
content_s

'From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\nFrom louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\nFrom zqian@umich.edu Fri Jan  4 16:10:39 2008\nFrom rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\nFrom zqian@umich.edu Fri Jan  4 15:03:18 2008\nFrom rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\nFrom cwen@iupui.edu Fri Jan  4 11:37:30 2008\nFrom cwen@iupui.edu Fri Jan  4 11:35:08 2008\nFrom gsilver@umich.edu Fri Jan  4 11:12:37 2008\nFrom gsilver@umich.edu Fri Jan  4 11:11:52 2008\nFrom zqian@umich.edu Fri Jan  4 11:11:03 2008\nFrom gsilver@umich.edu Fri Jan  4 11:10:22 2008\nFrom wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\nFrom zqian@umich.edu Fri Jan  4 10:17:43 2008\nFrom antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\nFrom gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\nFrom david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\nFrom stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\nFrom louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\nFrom louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\nFrom ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\nFrom cwen@iupui.edu Thu Jan  3 16:34:40 2008\nFrom cwen@iupui.edu Thu Jan  3 16:29:07 2008\nFrom cwen@iupui.edu Thu Jan  3 16:23:48 2008\n'

content_s.split("\n")

['From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008',
 'From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008',
 'From zqian@umich.edu Fri Jan  4 16:10:39 2008',
 'From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008',
 'From zqian@umich.edu Fri Jan  4 15:03:18 2008',
 'From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008',
 'From cwen@iupui.edu Fri Jan  4 11:37:30 2008',
 'From cwen@iupui.edu Fri Jan  4 11:35:08 2008',
 'From gsilver@umich.edu Fri Jan  4 11:12:37 2008',
 'From gsilver@umich.edu Fri Jan  4 11:11:52 2008',
 'From zqian@umich.edu Fri Jan  4 11:11:03 2008',
 'From gsilver@umich.edu Fri Jan  4 11:10:22 2008',
 'From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008',
 'From zqian@umich.edu Fri Jan  4 10:17:43 2008',
 'From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008',
 'From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008',
 'From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008',
 'From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008',
 '']

# # open the file connection and store in f
fhand = open(fpath, 'r') # open the file connection and store in the variable fhand
# # do this if you know that the structure of the file is basically a list of lines
content_list = fhand.readlines() # read the contents of the file, and dump into a list of strings called content_list
content_list

# content_list = fhand.readlines()
# fhand.close()

# print("content as string from .read()", content_s)
# print("content as list from .readlines()", content_list)

['From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008\n',
 'From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008\n',
 'From zqian@umich.edu Fri Jan  4 16:10:39 2008\n',
 'From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008\n',
 'From zqian@umich.edu Fri Jan  4 15:03:18 2008\n',
 'From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008\n',
 'From cwen@iupui.edu Fri Jan  4 11:37:30 2008\n',
 'From cwen@iupui.edu Fri Jan  4 11:35:08 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:12:37 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:11:52 2008\n',
 'From zqian@umich.edu Fri Jan  4 11:11:03 2008\n',
 'From gsilver@umich.edu Fri Jan  4 11:10:22 2008\n',
 'From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008\n',
 'From zqian@umich.edu Fri Jan  4 10:17:43 2008\n',
 'From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008\n',
 'From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008\n',
 'From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008\n',
 'From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\n',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008\n']

In the next module we will learn how the pandas library connects to files to cover common parsing situations (e.g., I have a spreadsheet, I want to go straight into a dataframe for analysis). More on that later! The concepts of accessing files will still apply.

Writing to a file#

Another common use case for connecting to files is to write to secondary memory.

The main thing to know here is the .write() method.

Think of it as similar to the print() function, except it writes to the file instead of the screen.

path = "assets/"
fname = "newfile5.txt"
fpath = f"{path}{fname}"

# basic write
# put 'w' as the seocnd argument
fhand = open(fpath, 'w')
fhand.write("Hello INST126!") # .write() takes a string as input, and returns the number of characters written, just writes to the file
fhand.write("\n\nAnother line")
# fhand.close()
# numwritten

You may be told in various places that you need to .close() a file to safely exit the connection. As best we can tell, this used to be very true: sometimes data would be lost if the file wasn’t closed. But now, in Python 3, we the instructional team have been unable to determine concrete, repeatable consequences of forgetting this in the context of this course. However! This can have consequences in very specific circumstances in professional settings: https://realpython.com/why-close-file-python/

This is why you’ll often see code using the with pattern, like this.

path = "assets/"
fname = ""
fpath = f"{path}{fname}"

# once the code inside the with block finishes, Python automatically closes the file
with open(fpath, 'w') as f:
    fhand.write("Hello world! Something new") # .write() takes a string as input, and then doesn't really return any values, just writes to the file

Iterating through a file, similar to readlines#

path = "assets/"
fname = "mbox-email-receipts.txt"
fpath = f"{path}{fname}"

thursday_records = []
fhand = open(fpath, 'r')
for line in fhand:
    if 'Thu' in line:
        thursday_records.append(line)
        
thursday_records

['From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008\n',
 'From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008\n',
 'From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:34:40 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:29:07 2008\n',
 'From cwen@iupui.edu Thu Jan  3 16:23:48 2008\n']

path = "assets/"
fname = "newfile2.txt"
fpath = f"{path}{fname}"

with open(fpath, 'r') as fhand:
    # use .read()
    fstring = fhand.read()

with open(fpath, 'r') as fhand:
    # use readlines
    flines = fhand.readlines()

# iterate through lines in the file
with open(fpath, 'r') as fhand:
    records = []
    for line in fhand:
        records.append(line)
        
print(fstring)

print(flines)

print(records)

hello world!

This is a new line
['hello world!\n', '\n', 'This is a new line']
['hello world!\n', '\n', 'This is a new line']

Aside: a reminder of chaining operations#

Consider these common variants of the open + read steps that accomplish them in one line (we previously separated the steps in part to make them clearer). Let’s say we have a file named "some_file.txt". This is what it would look like to open and read them in a single line:

fstring = open("assets/newfile2.txt", "r").read() # read all the contents of the file in as a string
fstring
# flines = open("some_file.txt", "r").readlines() # read the contents of the file in as a list of strings, one for each "line" in the file

'hello world!\n\nThis is a new line'

This works because the open() function is an expression that produces a file object, which is the kind of thing that can do the .read() or .readlines() methods. So, we can actually chain the read() or readlines() methods directly on the file object created by the open() expression, without bothering to first save the file object to a variable. This is the concept of **chaining methods/functions/operations directly on the result of an expression”.

Often we do this because we don’t want to waste “mental/visible” space on variables we actually don’t care about (e.g., often we never want to do anything with the file object later).

Remember how we saw this with “cleaning” strings?

For example, you might want to normalize a string by converting it to uppercase AND remove all leading and trailing whitespace. You can do it in one line like this

s_raw = "hello WorlD "

# you can do it like this
s_upper = s_raw.upper()
s_clean = s_upper.strip()

# or like this (skip the intermediate variable)
s_clean = s_raw.upper().strip() # convert to uppercase, which yields an uppercase string, and then chain the .strip() method directly on that string
# instead of first saving in another variable
print(s_clean)

# only get the first four characters, but make sure it's clean and uppercase
first_four = s_raw.upper().strip()[:4]
print(first_four)

HELLO WORLD
HELL

Common errors with files#

Can’t find the file: FileNotFoundError#

fhand = open("mbox-email-receipts.txt", 'r')
print(fhand.read())

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
/var/folders/xz/_hjc5hsx743dclmg8n5678nc0000gn/T/ipykernel_11301/2125143104.py in <module>
----> 1 fhand = open("mbox-email-receipts.txt", 'r')
      2 print(fhand.read())

FileNotFoundError: [Errno 2] No such file or directory: 'mbox-email-receipts.txt'

In basically all cases, the issue/mismatch is between your understanding of where the thing is (path) or what its name is (fname) and what you actually told to Python.

This could be a:

Misspelling (remember how literal Python is?)
Wrong/missing directions (e.g., missing a folder, or an operation)

Wrong connection type/permission: UnsupportedOperation#

# i said i would write to it
# but i tried to read it
fhand = open("assets/mbox-email-receipts.txt", 'r')
print(fhand.read())
fhand.close()

From stephen.marquard@uct.ac.za Sat Jan  5 09:14:16 2008
From louis@media.berkeley.edu Fri Jan  4 18:10:48 2008
From zqian@umich.edu Fri Jan  4 16:10:39 2008
From rjlowe@iupui.edu Fri Jan  4 15:46:24 2008
From zqian@umich.edu Fri Jan  4 15:03:18 2008
From rjlowe@iupui.edu Fri Jan  4 14:50:18 2008
From cwen@iupui.edu Fri Jan  4 11:37:30 2008
From cwen@iupui.edu Fri Jan  4 11:35:08 2008
From gsilver@umich.edu Fri Jan  4 11:12:37 2008
From gsilver@umich.edu Fri Jan  4 11:11:52 2008
From zqian@umich.edu Fri Jan  4 11:11:03 2008
From gsilver@umich.edu Fri Jan  4 11:10:22 2008
From wagnermr@iupui.edu Fri Jan  4 10:38:42 2008
From zqian@umich.edu Fri Jan  4 10:17:43 2008
From antranig@caret.cam.ac.uk Fri Jan  4 10:04:14 2008
From gopal.ramasammycook@gmail.com Fri Jan  4 09:05:31 2008
From david.horwitz@uct.ac.za Fri Jan  4 07:02:32 2008
From david.horwitz@uct.ac.za Fri Jan  4 06:08:27 2008
From david.horwitz@uct.ac.za Fri Jan  4 04:49:08 2008
From david.horwitz@uct.ac.za Fri Jan  4 04:33:44 2008
From stephen.marquard@uct.ac.za Fri Jan  4 04:07:34 2008
From louis@media.berkeley.edu Thu Jan  3 19:51:21 2008
From louis@media.berkeley.edu Thu Jan  3 17:18:23 2008
From ray@media.berkeley.edu Thu Jan  3 17:07:00 2008
From cwen@iupui.edu Thu Jan  3 16:34:40 2008
From cwen@iupui.edu Thu Jan  3 16:29:07 2008
From cwen@iupui.edu Thu Jan  3 16:23:48 2008

# i said i would read it
# but i tried to write to it
fhand = open("assets/mbox-email-receipts.txt", 'r')
print(fhand.write("Hello world"))

---------------------------------------------------------------------------
UnsupportedOperation                      Traceback (most recent call last)
/var/folders/xz/_hjc5hsx743dclmg8n5678nc0000gn/T/ipykernel_11301/4138939401.py in <module>
      2 # but i tried to write to it
      3 fhand = open("assets/mbox-email-receipts.txt", 'r')
----> 4 print(fhand.write("Hello world"))

UnsupportedOperation: not writable

import os
os.listdir()

['.DS_Store',
 'other_stuff',
 '.ipynb_checkpoints',
 '.jupyter',
 '9_Files.ipynb',
 'assets']

Writing different kinds of things to files#

Often we just pass strings directly into .write() to write data a file. But sometimes we want to write specific kinds of data structures to a file and preserve its structure in some way. One example is saving a dictionary to a file.

We could write the dictionary to the file like this:

d = {"a": 1, "b": 2, "c": 3}
with open("dictionary.txt", "w") as f:
    f.write(str(d)) # write the string representation of the dictionary to the file

But sometimes you want to write the dictionary in a more structured way, like JSON (which stands for Javascript String Object Notation; a standard way to represent data structures in string form to pass data between programs in a way that makes it easy to export/import with consistent structure and make parsing easy, often on the Internet).

To do this, we can use the json library to neatly package up a dictionary to be able to write it to a file and have confidence that it retains its essential structure and can easily be read back into a dictionary from a file.

The code for doing so might look like this:

import json
# write the dictionary to a file in JSON format
with open("dictionary.json", "w") as f:
    str_d = json.dumps(d, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces
    f.write(str_d) # write the JSON string to the file

Then the contents of the file will look like this:

{
    "a": 1,
    "b": 2,
    "c": 3
}

This is nice and easy to read for humans, and especially pays off for longer and more complex dictionaries. For instance:

complex_d = {
    "a": 1,
    "b": 2,
    "c": {
        "d": 3,
        "e": 4
    },
    "f": [5, 6, 7],
    "g": {
        "h": 8,
        "i": {
            "j": 9,
            "k": 10
        }
    }
}

with open("complex_dictionary.json", "w") as f:
    str_d = json.dumps(complex_d, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces
    f.write(str_d) # write the JSON string to the file

The .json file contents will look like this:

{
    "a": 1,
    "b": 2,
    "c": {
        "d": 3,
        "e": 4
    },
    "f": [
        5,
        6,
        7
    ],
    "g": {
        "h": 8,
        "i": {
            "j": 9,
            "k": 10
        }
    }
}

Instead of

{'a': 1, 'b': 2, 'c': {'d': 3, 'e': 4}, 'f': [5, 6, 7], 'g': {'h': 8, 'i': {'j': 9, 'k': 10}}}

As a bonus, you can use the json.loads() function to read the contents of a .json file directly into a dictionary in a python program, like this:

with open("dictionary.json", "r") as f:
    str_d = f.read() # read the contents of the file into a string
    d = json.loads(str_d) # convert the JSON string back to a dictionary
    print(d)

The dump() and load() functions from the json library makes this work even simpler!

# write the dictionary to a file in JSON format
with open("dictionary.json", "w") as f:
    str_d = json.dump(d, f, indent=4) # convert the dictionary to a JSON string, with an indentation of 4 spaces, and write to f

with open("dictionary.json", "r") as f:
    d = json.load(f) # read the contents of the JSON file into a dictionary
    print(d)

{'a': 1, 'b': 2, 'c': 3}

Aside: What is a library?#

You can think of a library (also called a module) is a collection of functions and data structures. You import a library (or subsets of it) into your program / notebook so you have access to special functions or data structures in your program.

You are already using Python’s standard library, which includes built-in functions like print(), and built-in data structures like str and dict. Every time you fire up Python, these are “imported” into your program in the background.

As you advance in your programming career, you will often find that you want to solve some (sub)problems that others have tried to do, and wrote a collection of functions and/or data structures to solve those problems really well, and saved that collection into a library that others can use. Take advantage of this!

For instance, os is a library that has convenient methods for handling files:

# import the os library
import os
# see the current working directory
print(os.getcwd())
# list the contents of the current working directory
print(os.listdir())
# check if a file exists
print(os.path.exists("assets/mbox-email-receipts.txt"))
# make a file path using os.path.join
path = os.path.join("assets", "mbox-email-receipts.txt")
print(path)

/Users/joelchan/Projects/inst126-intro-programming-notes
['Problem-Formulation.md', 'what-is-programming.md', '.DS_Store', 'example-course-copier.ipynb', 'LICENSE', 'requirements.txt', 'Practice_Module-2-Projects.ipynb', 'ncaa-team-data.csv', 'Practice_Debugging_examples.ipynb', 'dictionary.json', 'intro.md', 'Practice_Debugging_examples-Solutions.ipynb', '2b_Variables.md', '5_lists.md', 'dictionary.txt', '_static', '6_Iteration.ipynb', 'Practice_Conditionals.ipynb', 'laptop-report.txt', 'Practice_Module-1-Projects.ipynb', 'marvel-movies.csv', 'INST courses.csv', '4_Conditionals.md', 'README.md', 'module-4-review-scratchpad.ipynb', '_toc.yml', '3_Functions.md', '.gitignore', '7_Strings.ipynb', '_build', '_config.yml', 'Debugging-Helpseeking.md', 'laptop-weights-by-company.csv', 'Practice_Module-2_Lists.ipynb', '11_Pandas-2.ipynb', 'myfile.txt', '.ipynb_checkpoints', '10_Pandas-1.ipynb', '.jupyter', 'Help-Seeking-Template.md', '.git', '9_Files.ipynb', 'data', 'assets', '2a_Expressions.md', '8_Dictionaries.ipynb']
True
assets/mbox-email-receipts.txt

The import keyword followed by the name of the library makes the functions and data structures in the library available to your running Python interpreter. This is analogous to how def makes a function available for your program to call/use.

Once you import a library, you can access the functions in that library by first declaring the name of the library (e.g., json), then a ., then the name of the method (e.g., dumps()): notice that the syntax is similar to calling methods for data structures (e.g., some_list.append()).