Python Notes...
Page Contents
To Read
- https://docs.python.org/3/library/enum.html
- http://stackoverflow.com/questions/24481852/serialising-an-enum-member-to-json
- https://docs.djangoproject.com/en/1.10/topics/serialization/
- http://stackoverflow.com/questions/2872512/python-truncate-a-long-string
- https://github.com/django/django/blob/master/django/utils/text.py#L66
- http://stackoverflow.com/questions/6567831/how-to-perform-or-condition-in-django-queryset
- https://docs.djangoproject.com/en/1.10/topics/db/sql/
- http://stackoverflow.com/questions/20138428/how-to-create-a-temporary-table-and-not-lose-the-orm-in-django
- http://blog.roseman.org.uk/2010/04/13/temporary-models-django/
- http://stackoverflow.com/questions/34768732/temporary-models-in-django-1-9
- https://code.djangoproject.com/wiki/DynamicModels
- http://stackoverflow.com/questions/1074212/how-to-show-the-sql-django-is-running
- https://mattrobenolt.com/the-django-orm-and-subqueries/
- https://www.caktusgroup.com/blog/2009/09/28/custom-joins-with-djangos-queryjoin/
- http://stackoverflow.com/questions/10598940/django-rename-items-in-values - the second answer not the accepted one
- http://stackoverflow.com/questions/5466571/using-a-settings-file-other-than-settings-py-in-django
- djcelery is tinstalled by running 'pip install django-celery'
- https://pypi.python.org/pypi/django-solo
- https://www.python.org/dev/peps/pep-3333/
- https://docs.python.org/2/library/shlex.html
- https://pymotw.com/2/shlex/
- REST tutorial - http://slides.com/fiznool/no-rest-for-the-whippet#/
- https://docs.python.org/2/library/atexit.html
- https://julien.danjou.info/blog/2015/python-and-timezones
- http://agiliq.com/blog/2009/02/understanding-datetime-tzinfo-timedelta-amp-timezo/
- http://www.saltycrane.com/blog/2009/05/converting-time-zones-datetime-objects-python/
- http://stackoverflow.com/questions/3862310/how-can-i-find-all-subclasses-of-a-class-given-its-name
- https://nedbatchelder.com/text/names.html
- http://louistiao.me/posts/notebooks/embedding-matplotlib-animations-in-jupyter-notebooks/
- https://bit.ly/wtfpython - Python Gotchas
- https://swtch.com/~rsc/regexp/regexp1.html - SLOW REGEX ENGINE
Useful Python Related Sites
- Planet Python - A blog aggregating list that lets you keep up with what's new and fresh in the Python world.
- Python Tutor - An awesome site that visually lets you understanding what happens as the computer runs each line of source code! Very cool!
Python Debugger: Winpdb
A really quite cute Python debugger, easy to use and GUI driver, is Winpdb:
Winpdb is a platform independent GPL Python debugger with support for multiple threads, namespace modification, embedded debugging, encrypted communication and is up to 20 times faster than pdb. Winpdb is being developed by Nir Aides since 2005.
PyLint: Linting Python Code
Run PyLint
Generally you can run pylint on a directory. But note that directory and subdirectories you want to check must have the __init__.py file in them, even if it is just empty.
To just run pylint individually on all your python files do this...
find . -name '*.py' | xargs pylint --rcfile=pylint_config_filename
Use the -rn
option to suppress the summary tables at the end of the pylint output.
Usefully you can also use PyLint with PyEnchant to add spell checking to your
comments, which can be pretty useful. To configure the dictionary to use
just look up the [SPELLING]
section in the PyLint RC file!
Message Format
The messages have the following format:
MESSAGE_TYPE: LINE_NUM:[OBJECT:] MESSAGE
The message type can be one of the following:
[R] | means refactoring is recommended, |
[C] | means there was a code style violation, |
[W] | for warning about a minor issue, |
[E] | for error or potential bug, |
[F] | indicates that a fatal error occurred, blocking further analysis. |
Configuring Pylint
If you want to apply blanket settings across many files use the --rcfile=<filename>
switch. In the rcfile you can specify things like messages to supress at a global level, for example. This
is much easier than trying to list everything you want to supress on the command line each time
your run pyline.
To generate a template pylint rcfile use:
pylint --generate-rcfile
Inside the generated rcfile there are a few things that can be interesting. The most interesting is the init-hook which you can set, for example, to update the PYTHONPATH so that pylint can find all the imported modules:
[MASTER] init-hook='import sys; sys.path.append(...);'
Note that the string is a one-liner python script.
Explain An Error Message
In an error message you will get, at the end of the message a string in parenthesis. For example you light see something like this:
C:289, 4: Missing method docstring (missing-docstring) C:293, 4: Invalid method name "im_an_error" (invalid-name)
To get help on either of these errors, type:
pylint --help-msg=missing-docstring
Or...
pylint --help-msg=invalid-name
Suppressing Error Messages
To disable an error message for the entire file use --disable=msg-name
. So, if you want to
ignore all missing docstrings use --disable=missing-docstring
.
Find all PyLint codes here. Or, you can use the command line "pylint --list-msgs" to list error messages and their corresponding codes.
To supress an error message for a specifc line of code, or for a block of code (put comment on first line of block start), use #pylint: disable=...
Longer/Different Allowed Function/Variable/etc Names
Sometimes I just want names longer than 30 characters. You could say that these names are too long, but then, esp. for functions, I find shortening the name makes it less meaningful or introduces abbreviations for things, which can make the code harder to read, esp. if the aabreviation isn't a standard/well-known one.
In your rcfile navigate to the section [BASIC]. Here you can edit the regular expressions that are used to validate things like functions names. E.g., I sometimes change:
function-rgx=[a-z_][a-z0-9_]{2,30}
To:
function-rgx=[a-z_][a-z0-9_]{2,40}
Supress Warnings For Unused Function Arguments
Often you'll be defining something like a callback or implementing an interface etc but won't need to use all the function arguments. By default PyLint will bitch about this, but to get it to ignore the variable just prefix the variable name with either "dummy" or "_".
Use PyLint To Count LOC
Thanks to the author of, and comments made, for the following StackOverflow post.
Although LOC is not a good metric in the sense that many lines of bad code is still bad, to get a reasonable count of the lines of code (LOC) for all Python files contained in the current folder and all subfolders, use the following command.
find . -name '*.py' | xargs pylint 2>&1 | grep 'Raw metrics' -A 14
xargs takes the output of find and uses it to construct a parameter list that is passed to pylint. I.e. we get pylint to parse all files under our source tree. This output is passed to grep which searches for the "Raw Metrics" table heading and then outputs it along with the next 14 lines (due to the -A 14 option).
Spell Check Your Comments
On Linux systems you can use the enchant
library paired with pyenchant
to
add spell checking to your comments.
Install dependencies:
sudo apt install enchant sudo pip3 install pyenchant
Then you can lint using these options:
pylint3 \ --enable spelling \ --spelling-dict en_GB \ --spelling-private-dict-file mt-dict.txt \ FILENAME.py
Find Similar/Duplicate Code
pylint --disable=all --enable=similarities src/
Flake8
Flake8 is another static analyser / PEP8 conformace checker to python. I have found that sometimes it finds things that pylint doesn't and vice versa, so hey, why not use both?!
To configure it with the equivalent of a pylint rcfile just create the file tox.ini or setup.cfg (I prefer the former as the latter is a little too generic) in the directory that you run flake8 from. This avoids having to use a global config file - you can have one per project this way. All the command line options that you would configure flake8 with become INI file style settings. For example, if you ran:
flake8 --ignore=E221 --max-line-length==100
This would become the following in the config file (note the file must have the header [flake8]:
[flake8] ignore = E221 max-line-length = 100
Installing Python Libraries From Wheel Files
Python wheels
are the new standard of python distribution
. First make sure you have wheels
installed:
pip install wheel
Once you have installed wheels you can download wheel files (*.whl) to anywhere on your computer and run the following:
pip install /path/to/your/wheel/file.whl
So, for example, when I wanted to install lxml on my Windows box, I went to Christoph Gohlke's Unofficial Windows Binaries for Python Extension Packages and downloaded the file lxml-3.6.4-cp27-cp27m-win_amd64.whl and typed the following:
pip install C:\Users\my_user_name\Downloads\lxml-3.6.4-cp27-cp27m-win_amd64.whl
Windows Python Module Installers: When Python Can't Be Found
It seems, when I install Windows Python from scratch that some installers will give the following error message:
python version 2.7 required, which was not found in the registry
The answer on how to overcome this is found in this SO thread, credits to the answer's author!
To summarise, Windows Python Installer created [HKEY_LOCAL_MACHINE\SOFTWARE\Python] and all the subkeys therein, but not [HKEY_CURRENT_USER\SOFTWARE\Python]. Oops! Easiest way to evercome this is to load regedit.exe and natigate to the [HKEY_LOCAL_MACHINE\SOFTWARE\Python]. Righ click on this entry and export it to a file of your choosing. Then edit the file to replace all occurrences of HKEY_LOCAL_MACHINE with HKEY_CURRENT_USER. Save it and double click it to install the Python info to the current user registery keys. Now the installers will run :)
For example, my registery file, after edit looked like this:
Windows Registry Editor Version 5.00 [HKEY_CURRENT_USER\SOFTWARE\Python] [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore] [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7] [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\Help] [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\Help\Main Python Documentation] @="C:\\Python27\\Doc\\python2712.chm" [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\InstallPath] @="C:\\Python27\\" [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\InstallPath\InstallGroup] @="Python 2.7" [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\Modules] [HKEY_CURRENT_USER\SOFTWARE\Python\PythonCore\2.7\PythonPath] @="C:\\Python27\\Lib;C:\\Python27\\DLLs;C:\\Python27\\Lib\\lib-tk"
Running Python 2 and 3 on Windows
See this SO thread. To summarise:
## Run scripts: py -3 my_script.py # execute using python 3 py -2 my_script.py # execute using python 2 ## Run pip: pip3 (alias) py -3 -m pip install ... # Python 3 pip install py -2 -m pip install ... # Python 2 pip install
Python functions gotcha: default argument values - default value evaluated only once!
Ooh this one is interesting and is not at all how I intuitively imagined default values. I assumed that when a function parameter has a default value, that on every call to the function, the parameter is initialised with the default value. This is not the case, as I found [Ref]! The default value is evaluated only once and acts like a static variable in a C function after that! The following example is taken from the Python docs on functions:
def f(a, L=[]): # Caution! You might not expect it but this function accumulates # the arguments passed to it on subsequent calls L.append(a) return L print(f(1)) # Prints [1] print(f(2)) # Prints [1, 2], not [2] as you might expect!
This is summarised in the docs...
The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes ... [because] the default ... [will] be shared between subsequent calls ...
Python Binding (vs C++11 Binding)
Python lambda's bind late (are lazily bound) [Ref]. This means the the following code will have the output shown:
x = 1 f = lambda y: y * x print f(2) x = 10 print f(2) # Outputs: # 2 # 20
I.e., the value of x is looked up in the surrounding scope when the function is called and not when the expression
creating the lambda is evaluated. This
means that in the statement f = lambda y: y * x
, the variable x
is not
evaluated straight away. It is delayed until x
is actually needed. Hence the two different
values are output when we call f(2)
with the same parameter value.
We can go to the Python docs for further information:
A block is a piece of Python program text that is executed as a unit. The following are blocks:
- A module,
- A function body,
- A class,
- A script file,
- ...
... When a name is used in a code block, it is resolved using the nearest enclosing scope. The set of all such scopes visible to a code block is called the block’s environment ...
If a name is bound in a block, it is a local variable of that block. ... If a variable is used in a code block but not defined there, it is a free variable.
So, we can see that in the lambda expression above, the variable x
is a free variable.
So, it is resolved using the nearest enclosing scope. The nearest enclosing scope in the above example
happens to be the global scope.
This is the same example as you find in many classic examples [Ref], replicated here:
def create_multipliers(): return [lambda x : i * x for i in range(5)] for multiplier in create_multipliers(): print multiplier(2) # Outputs # 8 8 8 8 (newlines removed for brevity)
Why does it output 8's? Because the list comprehension creates a list of lambda functions,
in which the variable i
has not yet been evaluated. By the time
we come to evaluate the lambda i
is set to 4. How is i
evaluated?
As it is a free variable in the lambda it is "resolved using the nearest enclosing scope".
The nearest enclosing scope is the function body of create_multiplies()
(because
a list comprehension is not a block, so i
is bound in create_multiplies()
).
By the time create_multiplies()
exits, i
is 4, but because the lambda
closes this scope, every time i
is looked up, it is 4, because the lookup of i
does not occur until later, when the lambdas are actually evaluated.
I.e., create_multipliers()
is called. This creates a list of 4 lambda functions:
[lambda1, lambda2, lambda3, lambda4]
Each lambdaX
has not yet been evaluated, so by the time this list has been
create, the variable i
has the value 4. Later, when any of the lambda functions
are called, i
is evaluated so Python searches down the scope chain until it
finds the first i
, which it does and in this case it has the value 4!
Note, that this is a little different in C++. In C++ (C++11 or greater) however, you would have to pass x by reference to get the same result. If we transliterate the first example to C++ we get:
#include <iostream> int main(int argc, char* argv[]) { int x = 1; auto myLambda = [x](int y) { return x * y; }; std::cout << myLambda(2) << "\n"; x = 10; std::cout << myLambda(2) << "\n"; return 0; } // Prints: // 2 // 2 (notice here Python would print 20!
To get the behaviour of Python we have to do the following:
auto myLambda = [&x](int y) { return x * y; };
Note the ampersand added before x
so that the outer scope is passed
to the lambda by reference, not value!
Infinite recursion in __setattr__() & __getattr__() in Python
The recursion problem
In most, if not all, of the little tutorials I used to learn about __setattr__() and __getattr__() seemed either to treat them independently, in other words, the example classes had one or the other defined but not both, or used both but had very simple use cases. Then as I started to play with them, in my newbie-to-python state, I did the following (abstracted out into a test case). This also serves as a little Python __setattr__ example and a Python __getattr__ example...
class Test(object): def __init__(self): self._somePrivate = 1 def __getattr__(self, name): print "# GETTING %s" % (name) if self._somePrivate == 2: pass return "Test attribute" def __setattr__(self, name, val): print "# SETTING %s" % (name) if self._somePrivate == 2: pass super(Test, self).__setattr__(name, val) t = Test() print t.someAttr
Running this causes a the maximum recursion depth to be reached:
$ python test1.py # SETTING _somePrivate # GETTING _somePrivate ...<snip>... # GETTING _somePrivate Traceback (most recent call last): File "test1.py", line 17, int = Test() File "test1.py", line 3, in __init__ self._somePrivate = 1 File "test1.py", line 13, in __setattr__ if self._somePrivate == 2: File "test1.py", line 7, in __getattr__ if self._somePrivate == 2: ...<snip>... File "test1.py", line 7, in __getattr__ if self._somePrivate == 2: RuntimeError: maximum recursion depth exceeded
As I had read up on the subject it was clear that one can't set an attribute in __setattr__() because that would just cause __setattr__() to be called again resulting in infinite recursion (until the stack blows up!). The solution (in "new" style classes which derive from object) is to call the parent's __setattr__() method. As for __getattr__(), from the documentation it was also clear that "...if the attribute is found through the normal mechanism, __getattr__() is not called...".
So, I thought that was all my recursion problems sorted out. Also, if you delete either the __getattr__() or __setattr__() from the above example, it works correctly. So for example...
class Test2(object): def __init__(self): self._somePrivate = 1 def __getattr__(self, name): print "# GETTING %s" % (name) if self._somePrivate == 2: pass return "Test attribute" t = Test2() print t.someAttr
... the above test program works as expected and outputs the following.
# GETTING someAttr Test attribute
So, what is it about the first example that causes the infinite recursion? The first problem is this little line in the constructor...
self._somePrivate = 1
At this point in the constructor, variable self._somePrivate does not yet exist. When __setattr__() is called the first thing it will does is to query self._somePrivate...
def __setattr__(self, name, val): if self._somePrivate == 2: # -- Oops --
This means that __getattr__() must be called to resolve self._somePrivate because the variable does not yet exist and therefore cannot be "...found through the normal mechanism...". And here is the flaw... my initial assumption was that this would work because __getattr__() is only called if the attribute can't otherwise be found, and I thought it would be found.
But of course, it cannot be found, so __getattr__() also has to be called. Then, __getattr__() tries to access the variable self._somePrivate and because it still does not exist, __getattr__() is called again, and again, and so on... resulting in the infinite recursion seen.
And from this we can understand why the second example worked. Because there is no __setattr__() defined in the second test class, the method does not try to read the variable first (as my little example did) and so __getattr__() need never be called. Therefore the variable is created successful upon class initialisation and any subsequent queries on the variable will be found using the normal mechanism. Even if the second example had defined __setattr__(), as long as it did not try to read self._somePrivate, it would have been okay.
So the moral of this little story was, if implementing either of these magic methods, be careful which variables you access as part of the get/set logic!
I needed to do this however, so what can be done to resolve this. The solution is to define the constructor as follows, using exactly the same type of set we used in __setattr__() to avoid the recursion problem:
class Test(object): def __init__(self): super(Test, self).__setattr__('_somePrivate', 1)
Now the example works again... yay!
Setting the value of a class instance array
Another thing I had been doing was to set an element of an array in the __setattr__() function and a kind chappy on StackOverflow answered my question which I'll duplicate below. In the example below the line self._someAttr = 1 behaves as I'd have expected by getting __setattr__() to recurse, only the once, back into itself. What I didn't understand was why the line self._rowData[Test.tableInfo[self._tableName][name]] = val didn't do the same. I was thinking that to set the array we'd call __setattr__() again, but it doesn't. The test example is shown below.
class Test(object): tableInfo = { 'table1' : {'col1' : 0, 'col2':1} } def __init__(self, tableName): super(Test, self).__setattr__('_tableName', tableName) # Must be set this way to stop infinite recursion as attribute is accessed in bot set and get attr self._rowData = [123, 456] def __getattr__(self, name): print "# GETTING %s" % (name) assert self._tableName in Test.tableInfo if name in Test.tableInfo[self._tableName]: return self._rowData[Test.tableInfo[self._tableName][name]] else: raise AttributeError() def __setattr__(self, name, val): print "# SETTING %s" % (name) if name in Test.tableInfo[self._tableName]: print "Table column name found" self._rowData[Test.tableInfo[self._tableName][name]] = val self._someAttr = 1 else: super(Test, self).__setattr__(name, val) class Table1(Test): def __init__(self, *args, **kwargs): super(Table1, self).__init__("table1", *args, **kwargs) t = Table1() print t.col1 print t.col2 t.col1 = 999 print t.col1
It produces the following output...
$ python test.py # SETTING _rowData # GETTING col1 123 # GETTING col2 456 # SETTING col1 Table column name found # SETTING _someAttr # GETTING col1 999
So, why didn't the recursion occur for self._rowData[Test.tableInfo[self._tableName][name]] = val? I had thought we'd have to call __setattr__() again to set this. As the SO user "filmor" explained, the following happens:
self._rowData[bla] = val gets resolved to self.__getattr__("_rowData")[bla] = val. So we get the array (it already exists so is found by the normal mechanisms and not via another call to __getattr__(). But then to set an array value __setitem__() is used an not __setattr__(). So, the expression resolves to self.__getattribute__("_rowData").__setitem__(bla, val) and there is therefore no further __setattr__() called. Simples!
Concatenating immutable sequences more quickly in Python
PyDoc for immutable sequences says:
Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost ... build a list and use .join()
Interesting... I've been building up SQL strings using concatenation. Is using a join really better? Lets have a look... In my simple test below I create a large list of strings and concatenate them using string concatenation in test1 and list.join() in test2.
def test1(stringList): s = "" for i in stringList: s += "{}, ".format(i) def test2(stringList): s = ", ".join(stringList) if __name__ == '__main__': import timeit print(timeit.timeit("test1(map(lambda x: str(x), range(0,1000)))", setup="from __main__ import test1", number=10000)) print(timeit.timeit("test2(map(lambda x: str(x), range(0,1000)))", setup="from __main__ import test2", number=10000))
All the "map(lambda x: str(x), range(0,1000)" expression does is to create a list of 1000 strings to concatenate so that each test function is concatentating a list of the same strings.
On my system (it will be different on yours) I get the following output from the test program.
5.61275982857 2.88877487183
So joining a list of strings is faster than concatenating strings by approximately 50%.
Reading Excel Files in Python
Worth having a look at python-excel...
Reads Excel Files Using XLRD
Apparently only good for reading data and formatting information from older Excel files (ie: .xls)
but I seem to be using it fine on xlsx files...
To load the module use the following.
import xlrd
Open workbooks and worksheets as follows.
workbook = xlrd.open_workbook(xlsFile)
worksheet = workbook.sheet_by_name('my-worksheet-name')
Iterate through rows and access columns:
for rowIdx in range(worksheet.nrows):
row = worksheet.row(rowIdx)
col1_value = row[1].value
...
Deal with dates using xldate_as_tuple. It will convert an Excel date into
a tuple (year, month, day, hour, minute, nearest_second). When using this function
remember to use the datemode workbook.datemode
to use the correct date/time zone settings used in the spreadsheet.
dateColIdx = 1
rawdate = xlrd.xldate_as_tuple(row[dateColIdx].value, workbook.datemode)
print time.strftime('%Y-%m-%d', rawdate + (0,0,0))
Read Excel Files Using Pandas
Apparently only good for reading data and formatting information from older Excel files (ie: .xls) but I seem to be using it fine on xlsx files...
To load the module use the following.
import xlrd
Open workbooks and worksheets as follows.
workbook = xlrd.open_workbook(xlsFile) worksheet = workbook.sheet_by_name('my-worksheet-name')
Iterate through rows and access columns:
for rowIdx in range(worksheet.nrows): row = worksheet.row(rowIdx) col1_value = row[1].value ...
Deal with dates using xldate_as_tuple. It will convert an Excel date into a tuple (year, month, day, hour, minute, nearest_second). When using this function remember to use the datemode workbook.datemode to use the correct date/time zone settings used in the spreadsheet.
dateColIdx = 1 rawdate = xlrd.xldate_as_tuple(row[dateColIdx].value, workbook.datemode) print time.strftime('%Y-%m-%d', rawdate + (0,0,0))
Note that Pandas is zero indexed, whereas excel is 1 indexed.
import pandas pandas.read_excel(xlsxFileName, worksheetName, header=excel_header_row_number)
Finding Index Of Closest Value To X In A List In Python
If a list is unsorted, to find the closest value one would iterate through the list. At each index the distance from the value at that index to the target value is measured and if it is less than the least distance seen so far that index is recorded.
That's basically one for loop with a few tracking variables... O(n) operation. But, for loops aren't really very Pythonic in many ways and half to point of having a vectorized library like numpy is that we avoid that tedium.
This is why, when I saw this solution to the problem, I though "ooh that's clever"...findClosestIndex = lambda vec, val: numpy.arange(0,len(vec))[abs(vec-val)==min(abs(vec-val))][0] closestIndex = findClosestIndex(array_to_search, value_to_find_closest_to)
It's also a very terse read! So, let's break it down. The lambda expression is equivalent to the following.
def findClosestIndex(vec, val): # Pre: vec is a numpy.array, val is a numeric datatype vecIndicies = np.arange(len(vec)) # produces the distance of each array value from "val". distanceFromVal = abs(vec-val) # the smallest distance found. minDistance = min(distanceFromVal) # Produce a boolean index to the distance array selecting only those distances # that are equal to the minimum. vecIndiciesFilter = distanceFromVal == minDistance # vecIndicies[vecIndiciesFilter] is an array where each element is the index # of an element in vec which equals val. return vecIndicies[vecIndiciesFilter][0]
The line vecIndicies = np.arange(len(vec)) produces an array that is exactly the same size as the array vec where vecIndicies[i] == i.
The line distanceFromVal = abs(vec-val) produces an array where distanceFromVal[i] == |vec[i] - val|. In other words, each element in distanceFromVal corresponds to the distance of the same element in vec from the value we are searching for, val.
The next line...
The next line produces an array vecIndiciesFilter where each element, vecIndiciesFilter[i], is True if distanceFromVal[i] == minDistance
TODO... incomplete, needs finishing with SO better method and speed comparisons.
Drop Into The Python Interpretter
import code code.interact(local=locals())
Working With Files In Python
Check If a File Or Directory Exists
import os.path if os.path.isfile(fname): print "Found the file" if os.path.isdir(dirname): print "Found the directory"
Traversing Through Directories For Files
To find all files matching a certain pattern in a selected directory and all of its subdirectories, using something like the following can work quite well...
def YieldFiles(dirToScan, mask): for rootDir, subDirs, files in os.walk(dirToScan): for fname in files: if fnmatch.fnmatch(fname, mask): yield (rootDir, fname) # Find all .txt files under /some/dir for dir, file in YieldFiles("/some/dir", "*.txt") print file
The above searches from parent directory to children in a recursive descent,
i.e, top-down fashion. If you want to search bottom-up then add the
flag topdown=True
to the os.walk()
function.
Deleting Files and Directories (Recursively)
The Python library shutils
has plenty of functions for doing this.
For example, if you want to remove a temporary directory and all files
and subdirectories within...
if os.path.exists(cacheDir): shutil.rmtree(cacheDir) # Recursively delete dir and contents os.makedirs(cacheDir) # Recreate dir (recursively if # intermediete dirs dont exist)
However, you may sometimes run into problems on windows when deleting files or directories. This is normally a permissions issue. Also, although this seems silly, you won't be able to delete a directory if your current working directory is set to that directory or one of its children.
Handling Signals
Handling signals in Python can be done like so:
import signal # Somewhere in your initialisation signal.signal(signal.SIGINT, signal_handler) def signal_handler(signal, frame): # Clean up etc sys.exit(1)
Handle Non-Blocking Key Presses
## Keyboard stuff mostly from https://stackoverflow.com/a/55692274/1517244 with some modifications ## from https://stackoverflow.com/a/2409034/1517244 import os if os.name == 'nt': import msvcrt def setup_terminal(): pass def kbhit(): return msvcrt.kbhit() def kbchar(): return msvcrt.getch().decode("utf-8") else: import sys, select, tty, termios, atexit def setup_terminal(): fd = sys.stdin.fileno() old_settings = termios.tcgetattr(fd) tty.setcbreak(sys.stdin.fileno()) def restore_terminal(): fd = sys.stdin.fileno() termios.tcsetattr(fd, termios.TCSADRAIN, old_settings) atexit.register(restore_terminal) def kbhit(): dr,dw,de = select.select([sys.stdin], [], [], 0) return dr != [] def kbchar(): return sys.stdin.read(1)
TODO Stat and os.walk in opposite direction https://docs.python.org/2/library/os.html https://docs.python.org/2/library/stat.html http://stackoverflow.com/questions/2656322/python-shutil-rmtree-fails-on-windows-with-access-is-denied --- import fnmatch import os for root, dirs, files in os.walk("/some/dir"): for fname in files: if fnmatch.fnmatch(file, '*.txt'): pass --- Script dir http://stackoverflow.com/questions/4934806/how-can-i-find-scripts-directory-with-python Print literal {} https://docs.python.org/2/library/stat.html also formatting from my little debug output class Get Hostname https://wiki.python.org/moin/Powerful%20Python%20One-Liners/Hostname python get environment variable import os os.environ['A_VARIABLE'] = '1' print os.environ['A_VARIABLE'] ## Key must exist! To not care if key exists use print os.environ.get('A_VAR') # Returns None of key doesn't exist platform.system() https://docs.python.org/2/library/platform.html flush stdout import sys sys.stdout.flush() current time as string: def GetTimeNowAsString(): return time.strftime("%Y-%m-%d %H:%M:%S", time.gmtime()) logging: logging.basicConfig( format='%(asctime)s - %(name)s - %(levelname)s - %(message)s', level=logging.DEBUG [, filename='debug.txt', filemode='w']) log = logging.getLogger(__name__) printing on windows. can't remember where I got this... some SO thread, needs references! import tempfile import win32api import win32print filename = tempfile.mktemp (".txt") open (filename, "w").write ("This is a test") win32api.ShellExecute ( 0, "print", filename, # # If this is None, the default printer will # be used anyway. # '/d:"%s"' % win32print.GetDefaultPrinter (), ".", 0 )