How to Easily Understand Your Python Objects
Have you ever had a new Python object that you wanted to quickly familiarize yourself with? Or maybe you have a familiar object and you’re looking for that one particular method, but you don’t know how to describe it to Google. I frequently run into this issue in my data science workflow with complex objects in libraries, like TensorFlow. I also find myself wishing there was a faster way to get to know simple objects in new libraries, as documentation can be unavailable, incorrect, and time-consuming to look up.
In this blog post, I’ll show you how to deeply inspect objects yourself, and introduce a pip installable CLI tool I built called peep dis, which will do the work for you. If you want to jump straight to the tool, skip to the CLI Object Inspector: Peep Dis section.
Object Inspection
As a toy example, we’ll define a Rectangle
class with a few simple methods and attributes.
The dir
function is a simple built-in that lists all attributes and methods of an object unless __dir__
has been overloaded. This is what text editors and IDEs use for autocomplete.
>>> rect = Rectangle(3., 4.)
>>> dir(rect)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'a', 'area', 'b', 'bisect', 'scale']
The output is a list of strings representing the attributes and methods of the object, mostly consisting of built-ins. Usually, built-ins aren’t particularly useful and just add clutter.
Filtering Out Built-ins
Depending on our definition of built-ins, we can use either string filtering or type filtering to remove these.
String Filtering:
def dir_string_filter(obj):
is_magic = lambda x: (x.startswith('__') and x.endswith('__'))
return [x for x in dir(obj) if not is_magic(x)]>>> dir_string_filter(rect)
['a', 'area', 'b', 'bisect', 'scale']
Type Filtering:
from types import BuiltinMethodTypedef dir_type_filter(obj):
is_builtin = lambda x: isinstance(getattr(obj, x), BuiltinMethodType)
return [x for x in dir(obj) if not is_builtin(x)]>>> dir_type_filter(rect)
['__class__', '__delattr__', '__dict__', '__doc__', '__eq__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__ne__', '__repr__', '__setattr__', '__str__', '__weakref__', 'a', 'area', 'b', 'bisect', 'scale']
String filtering removes all “magic” methods and attributes, while filtering by BuiltinMethodType filters out built-in methods written in C, which leaves magic attributes and removes many non-magic methods, like string manipulations. In most cases, the magic attributes and methods are what we’d like to exclude, so we’ll use the string filtering method.
>>> dir_filtered = dir_string_filter(rect)
Separating Methods from Attributes
Of the items returned after filtering, we still don’t know which are attributes and which are methods. We can use the built-in callable function to filter them.
Attributes:
>>> attrs = [x for x in dir_filtered if not callable(getattr(rect, x))]>>> attrs
['a', 'b']
Methods:
>>> methods = [x for x in dir_filtered if callable(getattr(rect, x))]>>> methods
['area', 'bisect', 'scale']
To see the values of the attributes:
>>> attr_outputs = {x: getattr(rect, x) for x in attrs}>>> attr_outputs
{'a': 3.0, 'b': 4.0}
Calling Methods
For the methods, it’s not quite as simple to see the output values. One risk associated with indiscriminately calling a random method is that it could modify the original object state. For example, Rectangle.bisect will return None
, but it reduces the size of the rectangle by a factor of 2 (copied below).
...
def bisect(self):
""" reduce a by a factor of 2 to "cut in half" """
self.a /= 2
We can avoid modifications to the original object by making a copy.deepcopy
of it before each method call, although this can be computationally intensive for large objects. Note that methods which modify class variables, global variables, or interact with their external environment may still have lasting effects.
The get_callable
function defined below copies the original object and returns the method attached to that copy, which can be called independent of its parent object.
from copy import deepcopydef get_callable(obj, name: str):
return getattr(deepcopy(obj), name)
Methods that require positional arguments provide an additional challenge, like Rectangle.scale (copied below).
...
def scale(self, factor: float):
""" scale the side lengths by factor """
self.a = factor * self.a self.b = factor * self.b
We can get the outputs of the methods that don’t require positionals by using the “leap before you look policy”, or by using gestfullargspec
from the insepct
built-in module to determine which objects don’t require positional arguments and evaluating only those.
Calling Methods Technique 1: Leap Before You Look
def attempt_method_call(func):
try:
return str(func())
except:
return '(failed to evaluate method)'>>> outputs = {x: attempt_method_call(get_callable(rect, x)) for x in methods}
>>> outputs
{'area': 12.0, 'bisect': None, 'scale': '(failed to evaluate method)', }
As expected, area
and bisect
executed successfully, whereas scale
, which requires positional arguments, did not.
Calling Methods Technique 2: Check for Positionals
First, let’s introduce getfullargspec
:
from inspect import getfullargspec>>> getfullargspec(rect.scale)
FullArgSpec(args=['self', 'factor'], varargs=None, varkw=None, defaults=None, kwonlyargs=[], kwonlydefaults=None, annotations={'factor': <class 'float'>})
It returns a FullArgSpec
object. args
contains the argument names. vargs
and varkw
contain the names of variable length arguments and keyword arguments, as specified by the *
and **
operators, respectively (usually *args
and **kwargs
). defaults
contains the default values for keyword arguments. kwonlyargs
lists names of keyword-only args. kwonlydefaults
is a dictionary with keyword-only arg default values. annotations
is a dictionary specifying any type annotations.
We can use this information to check if a method has positional arguments and evaluate it only if it doesn’t. To start, we will attempt to get the FullArgSpec
of the method, although not all callables are supported. Then, we’ll extract the args and define a utility function _remove_self
to remove the self
argument which is implicit to standard methods. Although it’s not done here, we could additionally avoid calling class methods by checking for the cls
argument. Finally, if all args have defaults, then there are no positionals and the method can be called.
Using this method, we get the same results as the leap before you look method.
>>> method_outputs = {x: call_if_no_positionals(get_callable(rect, x)) for x in methods}>>> method_outputs
{'area': 12.0, 'bisect': None, 'scale': '(requires positional args)'}
Inferring Argument Types
Next, we can attempt to infer the type of each argument from any type annotations or default values. We defineinfer_arg_types
, which starts out similarly to the call_if_no_positionals
, but rather than calling the method, it populates an OrderedDict
with the inferred types.
Calling this on our Rectangle
instance, we get the types of all methods which require arguments, since they were all type hinted. Note, if they weren’t type hinted, this would only work for keyword arguments.
>>> method_arg_types = {x: get_arg_types(getattr(rect, x)) for x in methods}>>> method_arg_types
{'area': None, 'scale': OrderedDict([('factor', 'float')]), 'take_half': None}
Forging Arguments
If we want to see example outputs for methods that require positional arguments, we can attempt to use the argument types we inferred above to forge them by looking up sample values for each type. We can even attempt to forge collections if the content type is in the annotation (e.g. List[int]
).
from typing import List_sample_args = {
'float': 1.5,
'int': 2,
'str': 'abc',
'List[int]': [1, 2, 3],
}
We will define a ForgeError
so that any errors caused by attempting to forge arguments can be handled specifically. This will allow us to attempt to forge arguments for a collection of methods, even if some don’t work.
class ForgeError(ValueError):
pass
The forging function will take a method and look up sample arguments from _sample_args
by type from the infer_arg_types
output, raising errors if any arguments lacked defaults and types couldn’t be inferred, or if any types are presented that aren’t in _sample_args
.
Since this is a fairly complex function, this would be a good place for some unit testing.
Next, we can define a function that takes an object and iterates over all of its methods and uses our forge_args
function to attempt to forge the arguments for each using the “leap before you look” approach and noting the reason for any failures.
Let’s give this a try on our Rectangle
instance:
>>> forged_outputs = forge_and_eval_methods(rect)
>>> forged_outputs
{'area': '12.0', 'bisect': None, 'scale': 'None'}
The difference between this result and our earlier result is subtle, but notice that scale
now outputs ‘None’
rather than ‘requires positional args’
. That’s because the method was called successfully with the forged arguments, but rather than returning anything, it modifies the state of rect
by changing attributes a
and b
. It would be nice to track these modifications so that we can understand what methods do, even when they don’t return anything.
Tracking State Modification: Comparison Technique
In this toy example, Rectangle.scale
modifies the dimensions, a
and b
of the Rectangle
, but it’s hard for us to tell what happened since the method doesn’t return anything. We can track these modifications by saving a copy of all the objects’ attributes before the method call, then comparing them to the attributes after. We can define a StateComparator
object to allow us to save the current attributes using the __dict__
attribute, then check for new additions, deletions, and modifications of attributes after the method call.
The true implementation is a bit more complex because most built-ins don’t have a __dict__
attribute.
Using the forge_and_eval_methods
function as a template, we can define a new function which includes state tracking and an option to turn argument forging on or off.
Testing this on our rect
:
>>> call_all_tracked(rect)
{'area': '12.0', 'bisect': {'state changes': {'modified': {'a': (3.0, 1.5)}}}, 'scale': {'output': {'state changes': {'modified': {'a': (3.0, 4.5), 'b': (4.0, 6.0)}}}}}
Now any state changes are specified in dictionaries, where modifications are specified by a tuple where the first number represents the initial value, and the second represents the final value.
Unfortunately, forging arguments from keyword arguments and annotations is difficult because most Python code is not type-hinted, and much of it is unsupported by getargspec
. In these cases, arguments forgery could also be attempted by brute force or extraction from docstrings, which are planned features for peep dis.
Simply printing out docstrings might be an easier way to understand methods that require arguments in most cases. They can be systematically printed out from the __doc__
attribute.
>>> for x in dir_filtered:
>>> attr = getattr(x, __doc__, "No docstring")
>>> print(f'{x}: {attr}')
The output is too long to include here, and it’s difficult to decipher since it isn’t color-coded. The output can easily be colorized with termcolor, which is what was used for peep dis.
CLI Object Inspector: Peep Dis
We’ll take a quick look at how peep dis can be useful in two canonical cases.
I. The Mystery Object
We have a simple mystery_obj
which contains an array of San Francisco temperatures somewhere within it, but we don’t know where. We could call dir
, then iteratively check each method or attribute, or we could just peep
the object. We can quickly identify stdtemp
as the attribute we need.
Built-ins are filtered out, and outputs for the rest of the attributes and methods without positional arguments are printed. Methods are colored purple, and attributes are cyan. The outputs from methods requiring positional arguments are grayed out to allow us to skim others more quickly.
There are additional keyword arguments to include built-ins, including private methods, print docstrings, and truncate output lengths. Peep dis can also be used in a debugger, Jupyter Notebook, or IDE console.
II. What’s the name of that method?
We have a DataFrame
with the columns temp
and humidity
for San Francisco, which we want to convert to a narrow data model for an API we are building. There’s a one-liner for this, but nothing stands out in dir
, and nothing turns up on Stack Overflow. If we peep
the DataFrame, we’ll quickly identify melt
as the method we need.
To see what this process would have looked like the old fashioned way, see the Appendix below.
Conclusion
Thanks for reading, and please feel free to send me feedback on peep dis. If you like the library, please star it as well so that I know people are interested in its continued development. If you want to contribute, I would love to facilitate that.
Are you interested in working on high-impact projects and transitioning to a career in tech? Sign up to learn more about Insight Fellows programs.
Appendix: Old Fashioned Way
I. The mystery object
>>> dir(myster_obj)
['__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'itemp', 'mtemp', 'stdtemp', 'temp']
>>> obj.mtemp
<bound method WeatherSeries.mtemp of <main.WeatherSeries object at 0x7fdb6ed32748>>
>>> mystery_obj.mtemp()
{'min': 67, 'max': 71, 'index min': 0, 'index max': 4, 'len': 6}
>>> mystery_obj.itemp
<bound method WeatherSeries.itemp of <__main__.WeatherSeries object at 0x7fdb6ed32780>>
>>> mystery_obj.itemp()
TypeError: itemp() missing 1 required positional argument: 'i'
>>> mystery_obj.itemp(0)
67
>>> mystery_obj.temp()
array([ 0, 67],
[ 1, 69],
[ 2, 70],
[ 3, 70],
[ 4, 71],
[ 5, 70]])
>>> mystery_obj.stdtemp()
TypeError: 'numpy.ndarray' object is not callable
>>> mystery_obj.stdtemp
array([67, 69, 70, 70, 71, 70])
II. What’s the name of that method?
>>> df
humitidy temp
0 65 67
1 65 68
2 60 68
3 60 69
4 55 70
>>> dir(df)
['T', '_AXIS_ALIASES', '_AXIS_IALIASES', '_AXIS_LEN', '_AXIS_NAMES', '_AXIS_NUMBERS', '_AXIS_ORDERS', '_AXIS_REVERSED', '_AXIS_SLICEMAP', '__abs__', '__add__', '__and__', '__array__', '__array_wrap__', '__bool__', '__bytes__', '__class__', '__contains__', '__copy__', '__deepcopy__', '__delattr__', '__delitem__', '__dict__', '__dir__', '__div__', '__doc__', '__eq__', '__finalize__', '__floordiv__', '__format__', '__ge__', '__getattr__', '__getattribute__', '__getitem__', '__getstate__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__init_subclass__', '__invert__', '__ipow__', '__isub__', '__iter__', '__itruediv__', '__le__', '__len__', '__lt__', '__mod__', '__module__', '__mul__', '__ne__', '__neg__', '__new__', '__nonzero__', '__or__', '__pow__', '__radd__', '__rand__', '__rdiv__', '__reduce__', '__reduce_ex__', '__repr__', '__rfloordiv__', '__rmod__', '__rmul__', '__ror__', '__round__', '__rpow__', '__rsub__', '__rtruediv__', '__rxor__', '__setattr__', '__setitem__', '__setstate__', '__sizeof__', '__str__', '__sub__', '__subclasshook__', '__truediv__', '__unicode__', '__weakref__', '__xor__', '_accessors', '_add_numeric_operations', '_add_series_only_operations', '_add_series_or_dataframe_operations', '_agg_by_level', '_agg_doc', '_aggregate', '_aggregate_multiple_funcs', '_align_frame', '_align_series', '_apply_broadcast', '_apply_empty_result', '_apply_raw', '_apply_standard', '_at', '_box_col_values', '_box_item_values', '_builtin_table', '_check_inplace_setting', '_check_is_chained_assignment_possible', '_check_percentile', '_check_setitem_copy', '_clear_item_cache', '_combine_const', '_combine_frame', '_combine_match_columns', '_combine_match_index', '_combine_series', '_combine_series_infer', '_compare_frame', '_compare_frame_evaluate', '_consolidate', '_consolidate_inplace', '_construct_axes_dict', '_construct_axes_dict_for_slice', '_construct_axes_dict_from', '_construct_axes_from_arguments', '_constructor', '_constructor_expanddim', '_constructor_sliced', '_convert', '_count_level', '_create_indexer', '_cython_table', '_dir_additions', '_dir_deletions', '_ensure_valid_index', '_expand_axes', '_flex_compare_frame', '_from_arrays', '_from_axes', '_get_agg_axis', '_get_axis', '_get_axis_name', '_get_axis_number', '_get_axis_resolvers', '_get_block_manager_axis', '_get_bool_data', '_get_cacher', '_get_index_resolvers', '_get_item_cache', '_get_numeric_data', '_get_values', '_getitem_array', '_getitem_column', '_getitem_frame', '_getitem_multilevel', '_getitem_slice', '_gotitem', '_iat', '_iget_item_cache', '_iloc', '_indexed_same', '_info_axis', '_info_axis_name', '_info_axis_number', '_info_repr', '_init_dict', '_init_mgr', '_init_ndarray', '_internal_names', '_internal_names_set', '_is_builtin_func', '_is_cached', '_is_cython_func', '_is_datelike_mixed_type', '_is_mixed_type', '_is_numeric_mixed_type', '_is_view', '_ix', '_ixs', '_join_compat', '_loc', '_maybe_cache_changed', '_maybe_update_cacher', '_metadata', '_needs_reindex_multi', '_obj_with_exclusions', '_protect_consolidate', '_reduce', '_reindex_axes', '_reindex_axis', '_reindex_columns', '_reindex_index', '_reindex_multi', '_reindex_with_indexers', '_repr_data_resource_', '_repr_fits_horizontal_', '_repr_fits_vertical_', '_repr_html_', '_repr_latex_', '_reset_cache', '_reset_cacher', '_sanitize_column', '_selected_obj', '_selection', '_selection_list', '_selection_name', '_series', '_set_as_cached', '_set_axis', '_set_axis_name', '_set_is_copy', '_set_item', '_setitem_array', '_setitem_frame', '_setitem_slice', '_setup_axes', '_shallow_copy', '_slice', '_stat_axis', '_stat_axis_name', '_stat_axis_number', '_try_aggregate_string_function', '_typ', '_unpickle_frame_compat', '_unpickle_matrix_compat', '_update_inplace', '_validate_dtype', '_values', '_where', '_xs', 'a', 'abs', 'add', 'add_prefix', 'add_suffix', 'agg', 'aggregate', 'align', 'all', 'any', 'append', 'apply', 'applymap', 'as_blocks', 'as_matrix', 'asfreq', 'asof', 'assign', 'astype', 'at', 'at_time', 'axes', 'b', 'between_time', 'bfill', 'blocks', 'bool', 'boxplot', 'clip', 'clip_lower', 'clip_upper', 'columns', 'combine', 'combine_first', 'compound', 'consolidate', 'convert_objects', 'copy', 'corr', 'corrwith', 'count', 'cov', 'cummax', 'cummin', 'cumprod', 'cumsum', 'describe', 'diff', 'div', 'divide', 'dot', 'drop', 'drop_duplicates', 'dropna', 'dtypes', 'duplicated', 'empty', 'eq', 'equals', 'eval', 'ewm', 'expanding', 'ffill', 'fillna', 'filter', 'first', 'first_valid_index', 'floordiv', 'from_csv', 'from_dict', 'from_items', 'from_records', 'ftypes', 'ge', 'get', 'get_dtype_counts', 'get_ftype_counts', 'get_value', 'get_values', 'groupby', 'gt', 'head', 'hist', 'iat', 'idxmax', 'idxmin', 'iloc', 'index', 'info', 'insert', 'interpolate', 'is_copy', 'isin', 'isnull', 'items', 'iteritems', 'iterrows', 'itertuples', 'ix', 'join', 'keys', 'kurt', 'kurtosis', 'last', 'last_valid_index', 'le', 'loc', 'lookup', 'lt', 'mad', 'mask', 'max', 'mean', 'median', 'melt', 'memory_usage', 'merge', 'min', 'mod', 'mode', 'mul', 'multiply', 'ndim', 'ne', 'nlargest', 'notnull', 'nsmallest', 'nunique', 'pct_change', 'pipe', 'pivot', 'pivot_table', 'plot', 'pop', 'pow', 'prod', 'product', 'quantile', 'query', 'radd', 'rank', 'rdiv', 'reindex', 'reindex_axis', 'reindex_like', 'rename', 'rename_axis', 'reorder_levels', 'replace', 'resample', 'reset_index', 'rfloordiv', 'rmod', 'rmul', 'rolling', 'round', 'rpow', 'rsub', 'rtruediv', 'sample', 'select', 'select_dtypes', 'sem', 'set_axis', 'set_index', 'set_value', 'shape', 'shift', 'size', 'skew', 'slice_shift', 'sort_index', 'sort_values', 'sortlevel', 'squeeze', 'stack', 'std', 'style', 'sub', 'subtract', 'sum', 'swapaxes', 'swaplevel', 'tail', 'take', 'to_clipboard', 'to_csv', 'to_dense', 'to_dict', 'to_excel', 'to_feather', 'to_gbq', 'to_hdf', 'to_html', 'to_json', 'to_latex', 'to_msgpack', 'to_panel', 'to_period', 'to_pickle', 'to_records', 'to_sparse', 'to_sql', 'to_stata', 'to_string', 'to_timestamp', 'to_xarray', 'transform', 'transpose', 'truediv', 'truncate', 'tshift', 'tz_convert', 'tz_localize', 'unstack', 'update', 'values', 'var', 'where', 'xs']
It could take you about ten minutes to figure out that all you needed was df.melt
.
>>> df.melt()
variable value
0 humitidy 65
1 humitidy 65
2 humitidy 60
3 humitidy 60
4 humitidy 55
5 humitidy 55
6 temp 67
7 temp 69
8 temp 70
9 temp 70
10 temp 71
11 temp 70