Python instead of shell scripts

About me

  • Thomas Aglassinger
  • Software developer (e-commerce, finance, health)
  • Master's degree in information processing science
  • Co-organizer for PyGRAZ Python user group
  • Homepage: roskakori.at

Agenda

  • About shell scripts and Python
  • Short examples
  • Common shell operations in Python
  • Help you pick the right tool for the right job

Shell scripts: advantages

  • Quick to write
  • Run multiple commands
  • Pipe output

Shell scripts: disadvantages

  • Become messy quickly
  • "Hopeful" error handling
  • Hard to read syntax
  • Get easily confused (filename with space etc)
  • Difficult parsing of command line options
  • Cumbersome low level operations (math, strings, dates etc)
  • Difficult to get working on non Unix platforms

Python

  • Easy to read
  • "real" programming language
  • Reliable error handling with exceptions
  • Simple parsing of command line options
  • 1 line in shell = few lines in Python = several lines in many other languages

About the examples

  • Tested with Python 3.6, should work with 3.4 and 3.5
  • Only modules included with stock Python distribution are used
  • When installing Python, still consider using the Ananconda distribution because it already includes many useful external modules

Print a message

Shell

# Print a message
name='Alice'
echo Hello $name

Output:

Hello Alice

Python

In [22]:
# Print a message
name = 'Alice'
print('Hello ' + name)
Hello Alice

Find lines containing the word "er"

Shell

grep "\ber\b" glocke.txt

Output:

der nie bedacht, was er vollbringt.
daß er im innern Herzen spüret,
was er erschaffen mit seiner Hand.

Python

Read a file line by line:

In [2]:
# Traverse file line by line.
with open('glocke.txt', encoding='utf-8') as poem_file:
    for line in poem_file:
        line = line.rstrip('\n')  # Remove possible trailing linefeed.
        print(line)
Das Lied von der Glocke

Festgemauert in der Erden
Steht die Form aus Lehm gebrannt.
Heute muß die Glocke werden,
frisch, Gesellen, seid zur Hand!
Von der Stirne heiß
rinnen muß der Schweiß,
soll das Werk den Meister loben;
doch der Segen kommt von oben.

Zum Werke, das wir ernst bereiten,
geziemt sich wohl ein ernstes Wort;
wenn gute Reden sie begleiten,
dann fließt die Arbeit munter fort.
So laßt uns jetzt mit Fleiß betrachten,
was durch schwache Kraft entspringt;
den schlechten Mann muß man verachten,
der nie bedacht, was er vollbringt.
Das ists ja, was den Menschen zieret,
und dazu ward ihm der Verstand,
daß er im innern Herzen spüret,
was er erschaffen mit seiner Hand.

Regular expressions

In [3]:
import re

er_regex = re.compile(r'\ber\b')
if er_regex.search('der nie bedacht, was er vollbringt.'):
    print('found it!')
found it!

Put it together

In [4]:
import re

er_regex = re.compile(r'\ber\b')

with open('glocke.txt', encoding='utf-8') as poem_file:
    for line in poem_file:
        if er_regex.search(line):
            print(line.rstrip('\n'))
der nie bedacht, was er vollbringt.
daß er im innern Herzen spüret,
was er erschaffen mit seiner Hand.

Make a function

In [1]:
def grep(pattern, path):
    regex = re.compile(pattern)

    with open(path, encoding='utf-8') as poem_file:
        for line in poem_file:
            if er_regex.search(line):
                yield line.rstrip('\n')  # Return line by line as iterator
In [6]:
for line in grep(r'\ber\b', 'glocke.txt'):
    print(line)
der nie bedacht, was er vollbringt.
daß er im innern Herzen spüret,
was er erschaffen mit seiner Hand.

Collect lines in a list

List comprehension picks/converts items from an iterator and optionally filters them:

In [7]:
with open('glocke.txt', encoding='utf-8') as poem_file:
    lines_with_er = [
        line.rstrip('\n') for line in poem_file
        if er_regex.search(line)
    ]
print(lines_with_er)
['der nie bedacht, was er vollbringt.', 'daß er im innern Herzen spüret,', 'was er erschaffen mit seiner Hand.']

Thanks to our function this is even simplier:

In [8]:
print(list(grep(r'\ber\b', 'glocke.txt')))
['der nie bedacht, was er vollbringt.', 'daß er im innern Herzen spüret,', 'was er erschaffen mit seiner Hand.']

Find files matching a pattern

Shell

find /Users/roskakori/workspace/talks -name "*.py"

Output:

/Users/roskakori/workspace/talks/ebcdic_survival_kit/html_for_encoding.py
/Users/roskakori/workspace/talks/ebcdic_survival_kit/test_ebcdic.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/application.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/chaining.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/contextmanager.py
...

Python

In [9]:
import glob

file_pattern = '/Users/roskakori/workspace/talks/**/*.py'
for path in glob.iglob(file_pattern, recursive=True):
    print(path)
/Users/roskakori/workspace/talks/ebcdic_survival_kit/html_for_encoding.py
/Users/roskakori/workspace/talks/ebcdic_survival_kit/test_ebcdic.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/application.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/chaining.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/contextmanager.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/dataerror.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/fragments.py
/Users/roskakori/workspace/talks/europython_2014/errorhandling/performance.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/application.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/chaining.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/contextmanager.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/dataerror.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/fragments.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/performance.py
/Users/roskakori/workspace/talks/pygraz/errorhandling/somelog.py
/Users/roskakori/workspace/talks/pygraz/opensourceprojects/dividemo_1_base.py
/Users/roskakori/workspace/talks/pygraz/opensourceprojects/dividemo_2_fail_on_zero.py
/Users/roskakori/workspace/talks/pygraz/opensourceprojects/dividemo_3_console_script.py
/Users/roskakori/workspace/talks/pygraz/opensourceprojects/test_dividemo.py
/Users/roskakori/workspace/talks/pygraz/pygments/nanosqllexer.py
/Users/roskakori/workspace/talks/pygraz/pygments/nanosqllexer_finished.py
/Users/roskakori/workspace/talks/pygraz/pygments/pygmentstoy.py
/Users/roskakori/workspace/talks/pygraz/python_instead_of_shell_scripts/snipplets.py
/Users/roskakori/workspace/talks/pygraz/readwrite/test_binary.py
/Users/roskakori/workspace/talks/pygraz/readwrite/test_csv.py
/Users/roskakori/workspace/talks/pygraz/readwrite/test_json.py
/Users/roskakori/workspace/talks/pygraz/readwrite/test_text.py
/Users/roskakori/workspace/talks/pygraz/readwrite/test_xml.py
/Users/roskakori/workspace/talks/pygraz/traderbot/query_balance.py
/Users/roskakori/workspace/talks/pygraz/traderbot/query_trades.py
/Users/roskakori/workspace/talks/pygraz/unicode/examples.py
/Users/roskakori/workspace/talks/python_for_testers/examples/copytool.py
/Users/roskakori/workspace/talks/python_for_testers/examples/csvdict.py
/Users/roskakori/workspace/talks/python_for_testers/examples/csvlist.py
/Users/roskakori/workspace/talks/python_for_testers/examples/logconsole.py
/Users/roskakori/workspace/talks/python_for_testers/examples/myapp.py
/Users/roskakori/workspace/talks/python_for_testers/examples/shell_threads.py
/Users/roskakori/workspace/talks/python_for_testers/examples/test_divided_using_pytest.py
/Users/roskakori/workspace/talks/python_for_testers/examples/test_divided_using_unittest.py

Platform independent path operations

  • Unix and Windows use different path seperators: slash(/) and backslash(\)
  • Building paths using string operations and either slash or backslash only works on one platform
  • os.path to the rescue
In [10]:
import os

print(os.sep)  # The seperator between paths, e.g. '/' or '\'
/

Build paths with os.path

Combine path parts:

In [11]:
glocke_path = os.path.join(os.getcwd(), 'glocke.txt')
print(glocke_path)
/Users/roskakori/workspace/talks/pygraz/python_instead_of_shell_scripts/glocke.txt

Extract parts of a path

Extract folder and name:

In [12]:
folder, name = os.path.split(glocke_path)
print(folder)
print(name)
/Users/roskakori/workspace/talks/pygraz/python_instead_of_shell_scripts
glocke.txt

Extract name and suffix:

In [13]:
base_name, suffix = os.path.splitext(name)
print(base_name)
print(suffix)
glocke
.txt

Call external program

Shell

Count the number of lines in a file:

wc -l glocke.txt

Output:

23

Python

In [14]:
import subprocess

wc_result = subprocess.check_output(
    ['wc', '-l', glocke_path],
    encoding='utf-8')
print(wc_result)
      23 /Users/roskakori/workspace/talks/pygraz/python_instead_of_shell_scripts/glocke.txt

In [15]:
wc_result_words = wc_result.split()
wc_result_words
Out[15]:
['23',
 '/Users/roskakori/workspace/talks/pygraz/python_instead_of_shell_scripts/glocke.txt']
In [16]:
first_wc_result_word = wc_result_words[0]
first_wc_result_word
Out[16]:
'23'
In [17]:
glocke_line_count = int(first_wc_result_word)
glocke_line_count
Out[17]:
23

...and now with error handling

Shell

Need to explicitely check the exit code:

wc -l no_such_file.txt
if [[ $? -ne 0 ]]; then
    echo cannot count lines in no_such_file.txt
fi

Python

  • Always implicitely raises an Exception, can not be forgotten
  • Without error handler: program exits with stack trace
  • Handle errors using try ... except ...
In [18]:
try:
    wc_result = subprocess.check_output(
        ['wc', '-l', 'no_such_file.txt'],
        encoding='utf-8')
except subprocess.CalledProcessError as error:
    print('cannot count lines:', error)
cannot count lines: Command '['wc', '-l', 'no_such_file.txt']' returned non-zero exit status 1.

Parsing command line options

Shell scripts

  • A mess of if, echo and exit.
  • Commonly more code than the actually productive parts.
  • Example omitted.

Python

  • Use the argparse module
  • Declarative syntax
  • Assigns options values directly to Python variables
  • Automatically adds --help
  • Handles errors

Example parser

In [19]:
%%capture
import argparse

parser = argparse.ArgumentParser(description='copy files')
parser.add_argument(
    '--verbose', '-V', action='store_true', help='log each copied file')
parser.add_argument(
    '--log', metavar='LEVEL', choices=['debug', 'info', 'error'], dest='log_level',
    default='info', help='set log level to LEVEL (default: %(default)s)')
parser.add_argument(
    '--buffer', '-b', metavar='SIZE', dest='buffer_size', type=int, default=16,
    help='buffer size in kilo byte (default: %(default)s)')
parser.add_argument('--version', action='version', version='1.0')
parser.add_argument('sources', metavar='SOURCE', nargs='+', help='files to copy')
parser.add_argument('target', metavar='TARGET', help='target file or folder')

Generated --help

usage: snipplets.py [-h] [--verbose] [--log LEVEL] [--buffer SIZE]                   
                    [--version]
                    SOURCE [SOURCE ...] TARGET

copy files

positional arguments:
  SOURCE                files to copy
  TARGET                target file or folder

optional arguments:
  -h, --help            show this help message and exit
  --verbose, -V         log each copied file
  --log LEVEL           set log level to LEVEL (default: info)
  --buffer SIZE, -b SIZE
                        buffer size in kilo byte (default: 16)
  --version             show program's version number and exit

Parsing arguments

In [20]:
args = parser.parse_args(['--buffer=64', 'glocke.txt', 'bell.txt'])
print(args.sources)
print(args.target)
print(args.buffer_size)
['glocke.txt']
bell.txt
64

To parse the command line arguments:

In [21]:
# args = parser.parse_args()

Other useful functions and modules

  • os.makedir - create folder
  • os.remove - remove file
  • os.environ['HOME'] - access environment variable $HOME
  • shutil - "shell utilities" to (recursively) copy/move/remove files and folders

Summary

  • Python has several advantages over complex shell scripts
  • platform independent
  • "real" programming language
  • robust error handling
  • simple command line option parsing