As more and more code is written in Python — both PyRosetta protocols and other accessory scripts — it will be good to have a consistency and quality to our code. This page presents a list of guidelines and conventions for writing Python code in the Rosetta community.
This page is modeled after the C++ coding conventions; for the list of those conventions, see: Coding Conventions.
If not otherwise stated, the Python style guidelines presented in PEP 8 should be followed.
All Python code should have the following general file layout:
If the Python code is a script, include the following line, including the path to Python:
#!/usr/bin/env python
Note that, if present, this line must be the very first line in the file, before the copyright header, docstring, or additional comments.
The Rosetta Commons copyright header is required for every source code file in Rosetta. Do not make modifications. The header you should use for all .py
files is:
# (c) Copyright Rosetta Commons Member Institutions. # (c) This file is part of the Rosetta software suite and is made available under license. # (c) The Rosetta software is developed by the contributing members of the Rosetta Commons. # (c) For more information, see http://www.rosettacommons.org. Questions about this can be # (c) addressed to University of Washington CoMotion, email: license@uw.edu.
Immediately below the copyright notice block comments should go the "docstring" for the .py
file. (See more on documentation below, including how Doxygen reads Python comments.) This text should be opened and closed by a triplet of double quotes ("""
).
Include headers such as "Brief:", "Params:", "Output:", "Example:", "Remarks:", "Author:", ''etc''.
Example:
"""Brief: This PyRosetta script does blah. Params: ./blah.py .pdb Example: ./blah.py foo.pdb 1000 Remarks: Blah blah blah, blah, blah. Author: Jason W. Labonte """
import
statements come after the header and before any constants.
Import only one module per line:
import rosetta
import rosetta.protocols.rigid
not
import rosetta, rosetta.protocols.rigid
from rosetta import Pose, ScoreFunction
For really long Rosetta namespaces, use namespace aliases, e.g.:
import rosetta.protocols.loops.loop_closure.kinematic_closure as KIC_protocols
Avoid importing *
from any module. With large libraries, such as Rosetta, this is a waste of time and memory.
Group import
statements in the following order with comment headings (e.g., # Python standard library
) and a blank line between each group:
rosetta.init()
belongs in the main body of your script, not in the imports section. (See Main Body below.)
Module-wide constants and variables should be defined next, after the imports.
Constants should be named in ALL_CAPS_WITH_UNDERSCORES
. (See Naming Conventions below.)
Avoid using the global
statement anywhere in your code. All constants should be treated as read-only.
Do not define your own mathematical constants. Use the ones found in the Python math
module.
Classes and exposed methods should come next in the code.
Add two blank lines between each class and exposed method.
Add one blank line between each class method.
Docstrings for classes and methods are indented below the class/method declaration as part of the definition.
Group non-public methods together, followed by shared methods. Non-public methods should also be prefixed with (at least) a single underscore. (See Naming Conventions below.)
For methods, if there are too many arguments in the declaration to fit on a single line, align the wrapped arguments with the first character of the first argument:
def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo,
bar, n):
"""Apply foo to the bar residue of pose n times."""
pass
If your Python code is intended to be used as a script as well as a module, put
if name == "main":
rosetta.init()
my_main_subroutine(my_arg1, my_arg2)
at the end of the file.
If your Python code is intended to be a module only, do not include rosetta.init()
. It should be in the main body of the calling script.
If your Python code is intended to be a script only, do not include the if name == "main":
check.
As in Rosetta 3 C++ code, use the following naming conventions:
CamelCase
for class names (and therefore, exception names).
Exception
, Error
, or Warning
, as appropriate.Mover
or Protocol
.EnergyMethod
.Use box_car
with underscores separating words for variable and method names.
get_
(or is_
for functions returning a boolean).PyJobDistributor.native_pose = pose
.)Use box_car
with underscores for namespaces & directories, i.e., modules & packages.
box_car
with underscores even for modules containing only a single class. (This differs from the C++ convention (e.g., Pose.cc
) and is because the filenames themselves become the namespace in Python.)It is OK to use capital letters within variable and method names for words that are acronyms or abbreviations, e.g., HTML_parser()
.
url = "http://www.rosettacommons.org"
, not, URL = "http://www.rosettacommons.org"
.Likewise, it is OK to use an underscore to separate an acronym or abbreviation from other words in class names, e.g., PyMOL_Mover
.
In addition, the following conventions are unique to Python code:
Use ALL_CAPS
with underscores for constants.
Use _box_car
with a leading underscore for non-public methods and _CamelCase
with a leading underscore for non-public classes.
from module import *
.Use box_car_
with a trailing underscore only to avoid conflicts with Python keywords, e.g., class_
.
Never use box_car
with leading and trailing double underscores; that is reserved for special Python keywords, e.g., init
.
Avoid one-letter variable names. A descriptive name is almost always better.
x, y, z, i, j, k
.Use self
as the name of the first argument for class methods...
cls
.Python automatically passes objects into methods by reference and not by value. Thus, there usually is not a need to return an object that was passed and then modified by the method.
As in C++, conditional checks should happen inside the called method rather than in the calling method when possible. This helps keep things a bit more modular and also ensures that your method has no bad side effects if someone calls it but forgets to check for the essential condition. For example:
Instead of
if condition_exists:
my_method()
use
my_method()
where my_method()
begins with
if not condition_exists:
return
Avoid multiple inheritance.
All custom exceptions should inherit from Exception
. Do not use string exceptions. (They were removed in Python 2.6. See Exception Handling below.)
Check the type of arguments passed to a Python class method if it is possible that that method could be called from both Python and C++.
get()
method) for Python to use them.To avoid this issue, check the type of any object arguments passed; if they are APs, call get()
. For example:
class MyMover(rosetta.protocols.moves.Mover):
def apply(self, pose):
if isinstance(pose, rosetta.core.pose.PoseAP):
pose = pose.get()
pass
(For convenience, a "dummy" get()
method has been added to Pose
that returns the instance of the Pose
, so that one can use pose = pose.get()
without checking its type first. However, it would be impractical to add a get()
method to every class in Rosetta that one might wish to use in PyRosetta, so check the type!)
Be careful not to say pose = native_pose
when you really mean pose.assign(native_pose)
. The former creates a shallow copy; the latter a deep copy.
For comparisons to singletons (True
, False
, None
, a static class) use is
not ==
.
if is_option_on is True:
; write if is_option_on:
instead.if not is_option_on:
.if variable:
for booleans; if variable is not None:
is safer. (Some container types, e.g., can be false in a boolean sense.)
if not list:
in place of if len(list) == 0:
.Don't write if x < 10 and x > 5:
; simplify this to if 5 < x < 10:
.
Use !=
instead of <>
for consistency.
*Use if my_string.startswith("foo"):
and if my_string.endswith("bar"):
instead of if my_string[:3] == "foo":
and if my_string[-3:] == "bar"
. Besides the fact that the former is *way easier to read, it's safer.
Use if isinstance(object, rosetta.core.pose.Pose):
; do not use if type(object) is rosetta.core.pose.Pose:
.
optparse
was deprecated and replaced in Python 2.7, and argparse
does all the same things.Use if
and try/except
blocks to test for errors that could happen in normal program operation. Errors should generate useful reports that include the values of relevant variables.
Exception
. (See Classes above.)Use raise HugeF_ingError("Oh, crap!")
, instead of raise HugeF_ingError, "Oh, crap!"
. (This makes it easier to wrap long error messages. Plus, it's going away in Python 3.0.)
raise "Oh, crap!"
; that already went away in Python 2.6.Keep the number of lines tested in a try
block to the bare minimum. This makes it easier to isolate the actual problem.
else
or a finally
afterwards.)Avoid naked except
clauses. They make it more difficult to isolate the actual problem.
except Exception:
, which is better than except:
because it will only catch exceptions from actual program errors; naked except
s will also catch keyboard and system errors.except
clause, e.g., except HugeF_ingError, SneakyError:
.If you need to instantiate a particular Rosetta vector1_
container for use in a Rosetta function, if possible, use PyRosetta's Vector1()
constructor function, which takes a list as input and determines which vector1_
is needed.
For example, instead of:
list_of_filenames = utility.vector1_string()
list_of_filenames.extend(["file1.fasta", "file2.fasta", "file3.fasta"])
use:
list_of_filenames = Vector1(["file1.fasta", "file2.fasta", "file3.fasta"])
Use pose_from_sequence()
instead of make_pose_from_sequence()
. (The former has ω angles set to 180° automatically.)
Write a += n
, not a = a + n
.
a -= n
, a *= n
, and a /= n
.Use list comprehensions. They are beautiful.
For example, instead of:
cubes = []
for x in range(10):
cubes.append(x3)
or:
cubes = map(lambda x: x3, range(10))
use:
cubes = [x**3 for x in range(10)]
(Lambda functions are super cool, but the second example is far less readable than the third; if you have a lambda
inside a map
, you should be using a list comprehension instead.)
You can also use if
to limit the elements in your list:
even_cubes = [x3 for x in range(10) if x3 % 2 == 0]
Use with
when opening files. Context management is safer, because the file is automatically closed, even if an error occurs.
For example, instead of:
file = open("NOEs.cst")
constraints = file.readlines()
file.close()
use:
with open("NOEs.cst") as file:
constraints = file.readlines()
Doxygen can autodocument Python code in addition to C++ code, but in the case of Python, it simply reads the doc
attribute of every module, class, and method. This is why it is crucial to put all class and method docstrings indented below the declaration line. Otherwise, they will not be stored in the proper doc
variable. (In Python,...
class MyClass(): """This is my class. It is great. """
...is equivalent to...
class MyClass(): __doc__ = "This is my class.\nIt is great.\n\n"
Always use double quotes for docstrings. While Python recognizes both """
and '''
, some text editors only recognize the former.
All classes and methods must contain a docstring with at least a brief statement of what the class or method is or should do, respectively.
"""
."""
on the same line."""This class contains terpene-specific ligand data."""
"""Return True if this ligand is a sesquiterpene."""
"""
.Docstrings for scripts should be written to serve also as the script's help/usage message.
Docstrings for modules should list all public classes and methods that are exported by the module.
Document a class's constructor in the init()
method's docstring, not in the class's docstring.
Comments should usually be complete sentences.
Use block comments to talk about the code immediately following the comments.
Use inline comments (sparingly) to clarify confusing lines.
As mentioned at the top of this page, when it doubt, follow PEP 8.
Use 4 spaces to indent, not tabs.
Indent 4 spaces per nested level.
Indent 4 additional spaces when method arguments must be wrapped beyond the first line.
def take_pose_and_apply_foo_to_its_bar_residue_n_times( pose, foo, bar, n): """Applies foo to the bar residue of pose n times.""" passExample 2:
def take_pose_and_apply_foo_to_its_bar_residue_n_times(pose, foo, bar, n): """Applies foo to the bar residue of pose n times.""" pass
While there are numerous different styles for using spaces in argument lists and expressions in C++, PEP 8 recommends the following (and provides many examples):
Do not put spaces around parentheses, brackets, or curly braces.
Do not put spaces before commas, colons, or semicolons.
Put one space around operators.
=
if used as part of a keyword argument or default value assignment in a method call or declaration, respectively.y = ax2 + bx + c
.Limit lines to a maximum of 79 characters.
</code> to wrap lines, do so after any operators.
Do not put semicolons at the end of lines.
Do not use semicolons to combine multiple lines.
[to be added later… ~ labonte 20:56, 31 Aug 2012 (PDT)]