CSci 150: Foundations of computer science
Home Syllabus Readings Projects Tests

Files & regular expressions

Earlier we saw how we could load a file line-by-line using a for loop over the file object returned by open.

filename = input()
infile = open(filename)
for line in infile:
    print(line.capitalize())

An alternative way to load a file is by appling the read method to the file object. This method loads in the entirety of the file into a single string, which it returns.

As a simple example, suppose we want a program to count the number of lower-case r's in a file. By using read, we have a string to which we can then apply the count method that applies to strings.

filename = input()
infile = open(filename)
text = infile.read()
num_rs = text.count('r')
print("There are {0} r's".format(num_rs))

A particularly powerful technique is to use the re module's findall function to pull a list of all matches to a regular expression from the file. For instance, the following program displays all the capitalized words in the file.

import re

filename = input()
infile = open(filename)
text = infile.read()
cap_words = re.findall('[A-Z][A-Za-z]*'text)
for w in cap_words:
    print(w)