Jupyter Snippet P4M WorkingWithFiles

Working with Files in Python

This notebook provides the basics for how to work with files. Most of the functinality needed is provided by the os module.

To get started we are going to create a directory with a set of files by extracting all of the files from the cplexLP.zip zip file that you would have downloaded in the “IntroToDataScience” notbook

import os
# the system() function executes a command in the underlying (linux/windows/... shell)
os.system("unzip cplexLP.zip")

How do we know if the files have been extracted? What files are there? os.listdir() shows the files that are available

os.listdir(".")

['cplexLP.zip',
 'TheNeedForSpeed-Answers.ipynb',
 '.ipynb_checkpoints',
 'MIPLIBtest.ipynb',
 'gauss.cpp',
 '__pycache__',
 'cplex.log',
 'TheNeedForSpeed.ipynb',
 'gausstest.py',
 'TripletEnumerationExercise.ipynb',
 'cplexLP',
 'cplexLP.json',
 'IntroToDataScience-Answers.ipynb',
 'WorkingWithFiles.ipynb',
 'gauss.so',
 'IntroToDataScience.ipynb',
 'features.json']

How do we know if any of these are subdirectories? Use the os.path submodule which has many functions of the form isXXX where XXX is a type of file to check what kind of file we have

print("Methods for testing filetype in os.path:",
      [ name for name in dir(os.path) if name.startswith("is")])

Methods for testing filetype in os.path: ['isabs', 'isdir', 'isfile', 'islink', 'ismount']

Mostly we just want to distinguish between directories and files. Lets find all of the subdirectories of the current directory

print("Subdirectories are:",
      ", ".join([name for name in os.listdir(".") # names in the current directory
                 if os.path.isdir(name)])
     )# end of print after joining all the names with `, `

Subdirectories are: .ipynb_checkpoints, __pycache__, cplexLP

How do we find all of the filenames within the cplexLP directory? Basically using listdir again. However note that we need to create complete path names of the form "cplexLP/file" (or "cplexLP\file" if you are under windows. We use os.path.sep to give us the right separator between directories

allFiles = [ "cplexLP" + os.path.sep + name for name in os.listdir("cplexLP")]
print(allFiles[:10])

['cplexLP/core2536-691.dual.log', 'cplexLP/pg5_34.dual.log', 'cplexLP/ns1208400.barrier.log', 'cplexLP/reblock67.barrier.log', 'cplexLP/bnatt350.dual.log', 'cplexLP/rococoC10-001000.primal.log', 'cplexLP/eilB101.dual.log', 'cplexLP/mine-90-10.barrier.log', 'cplexLP/n3seq24.primal.log', 'cplexLP/mine-166-5.barrier.log']

Here all filenames have a pattern. They consist of the directory/instance.method.log. How can we take one of these filenames and break it into parts? Perhaps use string split():

filename = allFiles[3] # pick some filename
name = filename.split(os.path.sep)[-1] # split of directory names and only keep the last part
instance,method = name.split(".")[:2] # split by . and keep the first two parts as we don't care about .log
print(filename,"solves",instance,"using",method)

cplexLP/reblock67.barrier.log solves reblock67 using barrier

Reading files. Use the open command to open a file, and close method to close it:

f = open(filename,"r")
f.close()

Note that files are automatically closed when the object (f in this case) disappears. However given that python does garbage collection at infrequent intervals we might want to do the following:

def readHead(filename):
    with open(filename,"r") as f:
        print(f.readline()) # print the first line of the file
        print(f.readline()) # print the second line of the file 
        return
        print(f.readline()) # print the third line of the file 
readHead("cplexLP/reblock67.barrier.log")

Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.8.0.0

In this form the file f is automatically closed when we hit the return (by the with statement) as soon as we hit the return statement or if we were to throw an exception.

To read lines of a file we can use f.readlines() for a file f or simply iterate over the lines in the file. Use .strip() to remove leading/trailing spaces (or other characters with an optional argument). The following prints only the lines from the file that contain exactly 4 words:

with open(filename,"r") as f:
    for line in f:
        words =  line.strip().split()
        if len(words) ==4: print(line)

Selected objective  name:  obj

Selected RHS        name:  rhs

Selected bound      name:  bnd

Objective sense      : Minimize

Objective nonzeros   :     670

  RHS nonzeros       :      20

Tried aggregator 1 time.

Using Nested Dissection ordering

  Primal:  Fixed no variables.

  Dual:  Fixing 1305 variables.

The line containse a \n and lots of spaces. We might want to just re-assemble the list of words into a string where words are separated by a single space:

with open(filename,"r") as f:
    for line in f:
        words =  line.strip().split()
        if len(words) ==4: print(" ".join(words))

Selected objective name: obj
Selected RHS name: rhs
Selected bound name: bnd
Objective sense : Minimize
Objective nonzeros : 670
RHS nonzeros : 20
Tried aggregator 1 time.
Using Nested Dissection ordering
Primal: Fixed no variables.
Dual: Fixing 1305 variables.

Note: It might be tempting to read CSV files as

with open(csvfile,"r") as f:
    for line in f:
        cols = line.split(",")
        ## and so on

But this is a bad idea because one of your columns might contain the text "hello, world" and would get split into two. Use the dedicated csvreader module that does a much better job of this.

Of course you can also read the whole file in as a single string:

wholeText = open(filename,"r").read()
print(wholeText[:100]) # first 100 characters of our textfile

Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.8.0.0
  with Simplex, Mixed Integer & B

Writing files can be done very similarly. You can open a file for writing using "w" instead of "r" in the f=open(filename,"w") command or use "a" to append to the end of the file.

Now f.write(wholeText) will write all of the text. Or use print("hello",file=f) to print just a small amount of text.

Note that when writing text is buffered, so unless you use print("hello",file=f,flush=True) the text may not appear in the file until after you close it (and may get lost entirely if your program crashes before you manage to close the file - but that is rare).

Shell utilities

To do more complicated things with sets of files such as copying renaming or the like, use shutil. Here we will rename every file from .log to .txt

import shutil as sh
for filename in os.listdir("cplexLP"):
    if filename.endswith(".log"): # only move log files
        newname = filename.replace(".log",".txt")
        sh.move("cplexLP/"+filename,"cplexLP/"+newname) # could move to another directory too
os.listdir("cplexLP")[:10]

['bley_xl1.dual.txt',
 'csched010.dual.txt',
 'rail507.dual.txt',
 'macrophage.dual.txt',
 'n3div36.barrier.txt',
 'net12.barrier.txt',
 'triptim1.primal.txt',
 'mzzv11.barrier.txt',
 'beasleyC3.primal.txt',
 'opm2-z7-s2.primal.txt']

Time to clean up. Let’s delete the whole directory:

if "cplexLP" in os.listdir():
    sh.rmtree("cplexLP")
"cplexLP" in os.listdir() # check if it still exists

False