Jupyter Snippet P4M WorkingWithFiles
Jupyter Snippet P4M WorkingWithFiles
Working with Files in Python
This notebook provides the basics for how to work with files. Most of the functinality needed is provided by the os
module.
To get started we are going to create a directory with a set of files by extracting all of the files from the cplexLP.zip
zip file that you would have downloaded in the “IntroToDataScience” notbook
import os
# the system() function executes a command in the underlying (linux/windows/... shell)
os.system("unzip cplexLP.zip")
0
How do we know if the files have been extracted? What files are there?
os.listdir()
shows the files that are available
os.listdir(".")
['cplexLP.zip',
'TheNeedForSpeed-Answers.ipynb',
'.ipynb_checkpoints',
'MIPLIBtest.ipynb',
'gauss.cpp',
'__pycache__',
'cplex.log',
'TheNeedForSpeed.ipynb',
'gausstest.py',
'TripletEnumerationExercise.ipynb',
'cplexLP',
'cplexLP.json',
'IntroToDataScience-Answers.ipynb',
'WorkingWithFiles.ipynb',
'gauss.so',
'IntroToDataScience.ipynb',
'features.json']
How do we know if any of these are subdirectories? Use the os.path
submodule which has many functions of the form isXXX
where XXX
is a type of file to check what kind of file we have
print("Methods for testing filetype in os.path:",
[ name for name in dir(os.path) if name.startswith("is")])
Methods for testing filetype in os.path: ['isabs', 'isdir', 'isfile', 'islink', 'ismount']
Mostly we just want to distinguish between directories and files. Lets find all of the subdirectories of the current directory
print("Subdirectories are:",
", ".join([name for name in os.listdir(".") # names in the current directory
if os.path.isdir(name)])
)# end of print after joining all the names with `, `
Subdirectories are: .ipynb_checkpoints, __pycache__, cplexLP
How do we find all of the filenames within the cplexLP directory? Basically using listdir again. However note that we need to create complete path names of the form "cplexLP/file"
(or "cplexLP\file"
if you are under windows. We use os.path.sep
to give us the right separator between directories
allFiles = [ "cplexLP" + os.path.sep + name for name in os.listdir("cplexLP")]
print(allFiles[:10])
['cplexLP/core2536-691.dual.log', 'cplexLP/pg5_34.dual.log', 'cplexLP/ns1208400.barrier.log', 'cplexLP/reblock67.barrier.log', 'cplexLP/bnatt350.dual.log', 'cplexLP/rococoC10-001000.primal.log', 'cplexLP/eilB101.dual.log', 'cplexLP/mine-90-10.barrier.log', 'cplexLP/n3seq24.primal.log', 'cplexLP/mine-166-5.barrier.log']
Here all filenames have a pattern. They consist of the directory/
instance.
method.log
. How can we take one of these filenames and break it into parts? Perhaps use string split()
:
filename = allFiles[3] # pick some filename
name = filename.split(os.path.sep)[-1] # split of directory names and only keep the last part
instance,method = name.split(".")[:2] # split by . and keep the first two parts as we don't care about .log
print(filename,"solves",instance,"using",method)
cplexLP/reblock67.barrier.log solves reblock67 using barrier
Reading files. Use the open
command to open a file, and close
method to close it:
f = open(filename,"r")
f.close()
Note that files are automatically closed when the object (f
in this case) disappears. However given that python does garbage collection at infrequent intervals we might want to do the following:
def readHead(filename):
with open(filename,"r") as f:
print(f.readline()) # print the first line of the file
print(f.readline()) # print the second line of the file
return
print(f.readline()) # print the third line of the file
readHead("cplexLP/reblock67.barrier.log")
Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.8.0.0
In this form the file f is automatically closed when we hit the return (by the with
statement) as soon as we hit the return statement or if we were to throw an exception.
To read lines of a file we can use f.readlines()
for a file f
or simply iterate over the lines in the file.
Use .strip()
to remove leading/trailing spaces (or other characters with an optional argument).
The following prints only the lines from the file that contain exactly 4 words:
with open(filename,"r") as f:
for line in f:
words = line.strip().split()
if len(words) ==4: print(line)
Selected objective name: obj
Selected RHS name: rhs
Selected bound name: bnd
Objective sense : Minimize
Objective nonzeros : 670
RHS nonzeros : 20
Tried aggregator 1 time.
Using Nested Dissection ordering
Primal: Fixed no variables.
Dual: Fixing 1305 variables.
The line containse a \n
and lots of spaces. We might want to just re-assemble the list of words into a string where words are separated by a single space:
with open(filename,"r") as f:
for line in f:
words = line.strip().split()
if len(words) ==4: print(" ".join(words))
Selected objective name: obj
Selected RHS name: rhs
Selected bound name: bnd
Objective sense : Minimize
Objective nonzeros : 670
RHS nonzeros : 20
Tried aggregator 1 time.
Using Nested Dissection ordering
Primal: Fixed no variables.
Dual: Fixing 1305 variables.
Note: It might be tempting to read CSV files as
with open(csvfile,"r") as f:
for line in f:
cols = line.split(",")
## and so on
But this is a bad idea because one of your columns might contain the text "hello, world"
and would get split into two. Use the dedicated csvreader
module that does a much better job of this.
Of course you can also read the whole file in as a single string:
wholeText = open(filename,"r").read()
print(wholeText[:100]) # first 100 characters of our textfile
Welcome to IBM(R) ILOG(R) CPLEX(R) Interactive Optimizer 12.8.0.0
with Simplex, Mixed Integer & B
Writing files can be done very similarly. You can open a file for writing using "w"
instead of "r"
in the f=open(filename,"w")
command or use "a"
to append to the end of the file.
Now f.write(wholeText)
will write all of the text.
Or use print("hello",file=f)
to print just a small amount of text.
Note that when writing text is buffered, so unless you use print("hello",file=f,flush=True)
the text may not appear in the file until after you close it (and may get lost entirely if your program crashes before you manage to close the file - but that is rare).
Shell utilities
To do more complicated things with sets of files such as copying renaming or the like, use shutil
. Here we will rename every file from .log
to .txt
import shutil as sh
for filename in os.listdir("cplexLP"):
if filename.endswith(".log"): # only move log files
newname = filename.replace(".log",".txt")
sh.move("cplexLP/"+filename,"cplexLP/"+newname) # could move to another directory too
os.listdir("cplexLP")[:10]
['bley_xl1.dual.txt',
'csched010.dual.txt',
'rail507.dual.txt',
'macrophage.dual.txt',
'n3div36.barrier.txt',
'net12.barrier.txt',
'triptim1.primal.txt',
'mzzv11.barrier.txt',
'beasleyC3.primal.txt',
'opm2-z7-s2.primal.txt']
Time to clean up. Let’s delete the whole directory:
if "cplexLP" in os.listdir():
sh.rmtree("cplexLP")
"cplexLP" in os.listdir() # check if it still exists
False