标签 python 下的文章

[未完]Python3自省机制

前言

在计算机编程中,自省是指这种能力:检查某些事物以确定它是什么、它知道什么以及它能做什么。自省向程序员提供了极大的灵活性和控制力。

说的更简单直白一点:自省就是面向对象的语言所写的程序在运行时,能够知道对象的类型。简单一句就是,运行时能够获知对象的类型。

help()

python提供的一个自带帮助

modules:显示模块
keywords:显示关键字
symbols:显示操作符
topics:显示常见主题

访问对象属性

dir()

dir 就是把对象大部分的属性(非模块对象也包括类属性,父类属性等)放到一个list中。

如果不指定对象,则 dir() 返回当前作用域中的名称。

>>> dir(print)
['__call__', '__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__le__', '__lt__', '__module__', '__name__', '__ne__', '__new__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__self__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__text_signature__']
>>> dir()
['__builtins__', '__doc__', '__loader__', '__name__', '__package__', '__spec__']
#__builtins__  是python内置属性
>>> dir(__builtins__)
['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'WindowsError', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__loader__', '__name__', '__package__', '__spec__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip']

#可以看到help、print、type等内置函数方法或者类都在里面,如下:
>>> type(type)
<class 'type'>
>>> type(print)
<class 'builtin_function_or_method'>
>>> type(type)
<class 'type'>

hasattr()

hasattr(obj, attr):

这个方法用于检查obj是否有一个名为attr的值的属性,返回一个布尔值。

setattr()

setattr(obj, attr, val):

调用这个方法将给obj的名为attr的值的属性赋值为val。例如如果attr为'bar',则相当于obj.bar = val。

getattr()

返回对象元数据

当你对一个你构造的对象使用dir()时,可能会发现列表中的很多属性并不是你定义的。这些属性一般保存了对象的元数据,比如类的__name__属性保存了类名。大部分这些属性都可以修改,不过改动它们意义并不是很大;修改其中某些属性如function.func_code还可能导致很难发现的问题,所以改改name什么的就好了,其他的属性不要在不了解后果的情况下修改。

isinstance()

isinstance(obj, class_or_tuple)就是判断对象 obj 是否为 class_or_tuple 的实例或者其中一 个类的实例。

>>> isinstance("archerx",str)
True
>>> type(str)
<class 'type'>
>>> isinstance(a,type)     #a是type的实例
True
>>> isinstance(int,type)    #同上
True

模块(module)

  • __doc__: 文档字符串。如果模块没有文档,这个值是None。
  • __name__: 始终是定义时的模块名;即使你使用import .. as 为它取了别名,或是赋值给了另一个变量名。
  • __dict__: 包含了模块里可用的属性名-属性的字典;也就是可以使用模块名.属性名访问的对象。
>>> import sys
>>> sys.__doc__
"This module provides access to some objects used or maintained by the\ninterpreter and to functions that interact strongly with the interpreter.\n\nDynamic objects:\n\nargv -- command line arguments; argv[0] is the script pathname if known\npath -- module search path; path[0] is the script directory, else ''\nmodules -- dictionary of loaded modules\n\ndisplayhook -- called to show results in an interactive session\nexcepthook -- called to handle any uncaught exception other than SystemExit\n  To customize printing in an interactive session or to install a custom\n  top-level exception handler, assign other functions to replace these.\n\nstdin -- standard input file object; used by input()\nstdout -- standard output file object; used by print()\nstderr -- standard error object; used for error messages\n  By assigning other file objects (or objects that behave like files)\n  to these, it is possible to redirect all of the interpreter's I/O.\n\nlast_type -- type of last uncaught exception\nlast_value -- value of last uncaught exception\nlast_traceback -- traceback of last uncaught exception\n  These three are only available in an interactive session after a\n  traceback has been printed.\n\nStatic objects:\n\nbuiltin_module_names -- tuple of module names built into this interpreter\ncopyright -- copyright notice pertaining to this interpreter\nexec_prefix -- prefix used to find the machine-specific Python library\nexecutable -- absolute path of the executable binary of the Python interpreter\nfloat_info -- a struct sequence with information about the float implementation.\nfloat_repr_style -- string indicating the style of repr() output for floats\nhash_info -- a struct sequence with information about the hash algorithm.\nhexversion -- version information encoded as a single integer\nimplementation -- Python implementation information.\nint_info -- a struct sequence with information about the int implementation.\nmaxsize -- the largest supported length of containers.\nmaxunicode -- the value of the largest Unicode code point\nplatform -- platform identifier\nprefix -- prefix used to find the Python library\nthread_info -- a struct sequence with information about the thread implementation.\nversion -- the version of this interpreter as a string\nversion_info -- version information as a named tuple\ndllhandle -- [Windows only] integer handle of the Python DLL\nwinver -- [Windows only] version number of the Python DLL\n__stdin__ -- the original stdin; don't touch!\n__stdout__ -- the original stdout; don't touch!\n__stderr__ -- the original stderr; don't touch!\n__displayhook__ -- the original displayhook; don't touch!\n__excepthook__ -- the original excepthook; don't touch!\n\nFunctions:\n\ndisplayhook() -- print an object to the screen, and save it in builtins._\nexcepthook() -- print an exception and its traceback to sys.stderr\nexc_info() -- return thread-safe information about the current exception\nexit() -- exit the interpreter by raising SystemExit\ngetdlopenflags() -- returns flags to be used for dlopen() calls\ngetprofile() -- get the global profiling function\ngetrefcount() -- return the reference count for an object (plus one :-)\ngetrecursionlimit() -- return the max recursion depth for the interpreter\ngetsizeof() -- return the size of an object in bytes\ngettrace() -- get the global debug tracing function\nsetcheckinterval() -- control how often the interpreter checks for events\nsetdlopenflags() -- set the flags to be used for dlopen() calls\nsetprofile() -- set the global profiling function\nsetrecursionlimit() -- set the max recursion depth for the interpreter\nsettrace() -- set the global debug tracing function\n"
>>> sys.__name__
'sys'
>>> sys.__dict__
{'prefix': 'C:\\Python34', 'executable': 'C:\\Python34\\python34.exe', 'builtin_module_names': ('_ast', '_bisect', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', '_collections', '_csv', '_datetime', '_functools', '_heapq', '_imp', '_io', '_json', '_locale', '_lsprof', '_md5', '_multibytecodec', '_opcode', '_operator', '_pickle', '_random', '_sha1', '_sha256', '_sha512', '_sre', '_stat', '_string', '_struct', '_symtable', '_thread', '_tracemalloc', '_warnings', '_weakref', '_winapi', 'array', 'atexit', 'audioop', 'binascii', 'builtins', 'cmath', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'math', 'mmap', 'msvcrt', 'nt', 'parser', 'signal', 'sys', 'time', 'winreg', 'xxsubtype', 'zipimport', 'zlib'), 'gettrace': <built-in function gettrace>, '__loader__': <class '_frozen_importlib.BuiltinImporter'>, 'setswitchinterval': <built-in function setswitchinterval>, 'hexversion': 50595056, '__interactivehook__': <function enablerlcompleter.<locals>.register_readline at 0x0270A540>, 'getrefcount': <built-in function getrefcount>, 'exit': <built-in function exit>, 'last_value': AttributeError("'str' object has no attribute '__file__'",), 'implementation': namespace(cache_tag='cpython-34', hexversion=50595056, name='cpython', version=sys.version_info(major=3, minor=4, micro=4, releaselevel='final', serial=0)), '_getframe': <built-in function _getframe>, 'getwindowsversion': <built-in function getwindowsversion>, '__excepthook__': <built-in function excepthook>, 'maxsize': 2147483647, 'getsizeof': <built-in function getsizeof>, 'call_tracing': <built-in function call_tracing>, 'copyright': 'Copyright (c) 2001-2015 Python Software Foundation.\nAll Rights Reserved.\n\nCopyright (c) 2000 BeOpen.com.\nAll Rights Reserved.\n\nCopyright (c) 1995-2001 Corporation for National Research Initiatives.\nAll Rights Reserved.\n\nCopyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.\nAll Rights Reserved.', 'setprofile': <built-in function setprofile>, 'displayhook': <built-in function displayhook>, '__stdin__': <_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp936'>, '__spec__': ModuleSpec(name='sys', loader=<class '_frozen_importlib.BuiltinImporter'>), 'callstats': <built-in function callstats>, 'argv': [''], 'maxunicode': 1114111, 'float_info': sys.float_info(max=1.7976931348623157e+308, max_exp=1024, max_10_exp=308, min=2.2250738585072014e-308, min_exp=-1021, min_10_exp=-307, dig=15, mant_dig=53, epsilon=2.220446049250313e-16, radix=2, rounds=1), '_mercurial': ('CPython', 'v3.4.4', '737efcadf5a6'), 'flags': sys.flags(debug=0, inspect=0, interactive=0, optimize=0, dont_write_bytecode=0, no_user_site=0, no_site=0, ignore_environment=0, verbose=0, bytes_warning=0, quiet=0, hash_randomization=1, isolated=0), 'getcheckinterval': <built-in function getcheckinterval>, 'getswitchinterval': <built-in function getswitchinterval>, 'meta_path': [<class '_frozen_importlib.BuiltinImporter'>, <class '_frozen_importlib.FrozenImporter'>, <class '_frozen_importlib.WindowsRegistryFinder'>, <class '_frozen_importlib.PathFinder'>], 'base_prefix': 'C:\\Python34', 'version_info': sys.version_info(major=3, minor=4, micro=4, releaselevel='final', serial=0), 'exc_info': <built-in function exc_info>, 'byteorder': 'little', 'getallocatedblocks': <built-in function getallocatedblocks>, 'stdout': <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp936'>, 'path_importer_cache': {'C:\\Python34': FileFinder('C:\\Python34'), 'C:\\Python34\\lib\\site-packages': FileFinder('C:\\Python34\\lib\\site-packages'), 'C:\\Python34\\DLLs': FileFinder('C:\\Python34\\DLLs'), 'C:\\Users\\徐超': FileFinder('C:\\Users\\徐超'), 'C:\\Python34\\lib\\encodings': FileFinder('C:\\Python34\\lib\\encodings'), 'C:\\Python34\\lib': FileFinder('C:\\Python34\\lib'), 'C:\\Users\\徐超\\AppData\\Roaming\\Python\\Python34\\site-packages': FileFinder('C:\\Users\\徐超\\AppData\\Roaming\\Python\\Python34\\site-packages'), 'C:\\WINDOWS\\SYSTEM32\\python34.zip': None}, '__displayhook__': <built-in function displayhook>, 'setcheckinterval': <built-in function setcheckinterval>, 'ps1': '>>> ', 'float_repr_style': 'short', '__name__': 'sys', '_current_frames': <built-in function _current_frames>, '_xoptions': {}, 'last_type': <class 'AttributeError'>, 'thread_info': sys.thread_info(name='nt', lock=None, version=None), 'ps2': '... ', '_clear_type_cache': <built-in function _clear_type_cache>, 'stderr': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='cp936'>, 'base_exec_prefix': 'C:\\Python34', 'warnoptions': [], 'getrecursionlimit': <built-in function getrecursionlimit>, 'intern': <built-in function intern>, 'getfilesystemencoding': <built-in function getfilesystemencoding>, 'dllhandle': 502202368, 'winver': '3.4', 'path_hooks': [<class 'zipimport.zipimporter'>, <function FileFinder.path_hook.<locals>.path_hook_for_FileFinder at 0x0269A150>], '_home': None, '__doc__': "This module provides access to some objects used or maintained by the\ninterpreter and to functions that interact strongly with the interpreter.\n\nDynamic objects:\n\nargv -- command line arguments; argv[0] is the script pathname if known\npath -- module search path; path[0] is the script directory, else ''\nmodules -- dictionary of loaded modules\n\ndisplayhook -- called to show results in an interactive session\nexcepthook -- called to handle any uncaught exception other than SystemExit\n  To customize printing in an interactive session or to install a custom\n  top-level exception handler, assign other functions to replace these.\n\nstdin -- standard input file object; used by input()\nstdout -- standard output file object; used by print()\nstderr -- standard error object; used for error messages\n  By assigning other file objects (or objects that behave like files)\n  to these, it is possible to redirect all of the interpreter's I/O.\n\nlast_type -- type of last uncaught exception\nlast_value -- value of last uncaught exception\nlast_traceback -- traceback of last uncaught exception\n  These three are only available in an interactive session after a\n  traceback has been printed.\n\nStatic objects:\n\nbuiltin_module_names -- tuple of module names built into this interpreter\ncopyright -- copyright notice pertaining to this interpreter\nexec_prefix -- prefix used to find the machine-specific Python library\nexecutable -- absolute path of the executable binary of the Python interpreter\nfloat_info -- a struct sequence with information about the float implementation.\nfloat_repr_style -- string indicating the style of repr() output for floats\nhash_info -- a struct sequence with information about the hash algorithm.\nhexversion -- version information encoded as a single integer\nimplementation -- Python implementation information.\nint_info -- a struct sequence with information about the int implementation.\nmaxsize -- the largest supported length of containers.\nmaxunicode -- the value of the largest Unicode code point\nplatform -- platform identifier\nprefix -- prefix used to find the Python library\nthread_info -- a struct sequence with information about the thread implementation.\nversion -- the version of this interpreter as a string\nversion_info -- version information as a named tuple\ndllhandle -- [Windows only] integer handle of the Python DLL\nwinver -- [Windows only] version number of the Python DLL\n__stdin__ -- the original stdin; don't touch!\n__stdout__ -- the original stdout; don't touch!\n__stderr__ -- the original stderr; don't touch!\n__displayhook__ -- the original displayhook; don't touch!\n__excepthook__ -- the original excepthook; don't touch!\n\nFunctions:\n\ndisplayhook() -- print an object to the screen, and save it in builtins._\nexcepthook() -- print an exception and its traceback to sys.stderr\nexc_info() -- return thread-safe information about the current exception\nexit() -- exit the interpreter by raising SystemExit\ngetdlopenflags() -- returns flags to be used for dlopen() calls\ngetprofile() -- get the global profiling function\ngetrefcount() -- return the reference count for an object (plus one :-)\ngetrecursionlimit() -- return the max recursion depth for the interpreter\ngetsizeof() -- return the size of an object in bytes\ngettrace() -- get the global debug tracing function\nsetcheckinterval() -- control how often the interpreter checks for events\nsetdlopenflags() -- set the flags to be used for dlopen() calls\nsetprofile() -- set the global profiling function\nsetrecursionlimit() -- set the max recursion depth for the interpreter\nsettrace() -- set the global debug tracing function\n", 'last_traceback': <traceback object at 0x0271DBC0>, 'dont_write_bytecode': False, 'hash_info': sys.hash_info(width=32, modulus=2147483647, inf=314159, nan=0, imag=1000003, algorithm='siphash24', hash_bits=64, seed_bits=128, cutoff=0), 'getdefaultencoding': <built-in function getdefaultencoding>, 'exec_prefix': 'C:\\Python34', 'path': ['', 'C:\\WINDOWS\\SYSTEM32\\python34.zip', 'C:\\Python34\\DLLs', 'C:\\Python34\\lib', 'C:\\Python34', 'C:\\Users\\徐超\\AppData\\Roaming\\Python\\Python34\\site-packages', 'C:\\Python34\\lib\\site-packages'], 'excepthook': <built-in function excepthook>, 'platform': 'win32', 'stdin': <_io.TextIOWrapper name='<stdin>' mode='r' encoding='cp936'>, 'version': '3.4.4 (v3.4.4:737efcadf5a6, Dec 20 2015, 19:28:18) [MSC v.1600 32 bit (Intel)]', 'settrace': <built-in function settrace>, 'modules': {'os.path': <module 'ntpath' from 'C:\\Python34\\lib\\ntpath.py'>, '_locale': <module '_locale' (built-in)>, '_codecs_cn': <module '_codecs_cn' (built-in)>, 'errno': <module 'errno' (built-in)>, 'encodings.utf_8': <module 'encodings.utf_8' from 'C:\\Python34\\lib\\encodings\\utf_8.py'>, 'signal': <module 'signal' (built-in)>, '__main__': <module '__main__' (built-in)>, 'abc': <module 'abc' from 'C:\\Python34\\lib\\abc.py'>, 'io': <module 'io' from 'C:\\Python34\\lib\\io.py'>, 'encodings.mbcs': <module 'encodings.mbcs' from 'C:\\Python34\\lib\\encodings\\mbcs.py'>, 'atexit': <module 'atexit' (built-in)>, '_stat': <module '_stat' (built-in)>, 'os': <module 'os' from 'C:\\Python34\\lib\\os.py'>, '_warnings': <module '_warnings' (built-in)>, 'encodings.latin_1': <module 'encodings.latin_1' from 'C:\\Python34\\lib\\encodings\\latin_1.py'>, '_thread': <module '_thread' (built-in)>, 'encodings': <module 'encodings' from 'C:\\Python34\\lib\\encodings\\__init__.py'>, '_codecs': <module '_codecs' (built-in)>, '_multibytecodec': <module '_multibytecodec' (built-in)>, 'marshal': <module 'marshal' (built-in)>, 'codecs': <module 'codecs' from 'C:\\Python34\\lib\\codecs.py'>, 'ntpath': <module 'ntpath' from 'C:\\Python34\\lib\\ntpath.py'>, '_weakrefset': <module '_weakrefset' from 'C:\\Python34\\lib\\_weakrefset.py'>, 'sys': <module 'sys' (built-in)>, 'keyword': <module 'keyword' from 'C:\\Python34\\lib\\keyword.py'>, 'builtins': <module 'builtins' (built-in)>, '_frozen_importlib': <module '_frozen_importlib' (frozen)>, 'sysconfig': <module 'sysconfig' from 'C:\\Python34\\lib\\sysconfig.py'>, 'zipimport': <module 'zipimport' (built-in)>, '_weakref': <module '_weakref' (built-in)>, 'nt': <module 'nt' (built-in)>, 'winreg': <module 'winreg' (built-in)>, '_io': <module 'io' (built-in)>, 'genericpath': <module 'genericpath' from 'C:\\Python34\\lib\\genericpath.py'>, 'site': <module 'site' from 'C:\\Python34\\lib\\site.py'>, 'encodings.gbk': <module 'encodings.gbk' from 'C:\\Python34\\lib\\encodings\\gbk.py'>, 'encodings.aliases': <module 'encodings.aliases' from 'C:\\Python34\\lib\\encodings\\aliases.py'>, '_imp': <module '_imp' (built-in)>, 'stat': <module 'stat' from 'C:\\Python34\\lib\\stat.py'>, '_sitebuiltins': <module '_sitebuiltins' from 'C:\\Python34\\lib\\_sitebuiltins.py'>, '_collections_abc': <module '_collections_abc' from 'C:\\Python34\\lib\\_collections_abc.py'>, '_bootlocale': <module '_bootlocale' from 'C:\\Python34\\lib\\_bootlocale.py'>}, 'setrecursionlimit': <built-in function setrecursionlimit>, '__stdout__': <_io.TextIOWrapper name='<stdout>' mode='w' encoding='cp936'>, 'getprofile': <built-in function getprofile>, '__stderr__': <_io.TextIOWrapper name='<stderr>' mode='w' encoding='cp936'>, '_debugmallocstats': <built-in function _debugmallocstats>, 'api_version': 1013, 'int_info': sys.int_info(bits_per_digit=15, sizeof_digit=2), '__package__': ''}

类(class)

__doc__: 文档字符串。如果类没有文档,这个值是None。

__dict__: 包含了类里可用的属性名-属性的字典;也就是可以使用类名.属性名访问的对象。

__module__: 包含该类的定义的模块名;需要注意,是字符串形式的模块名而不是模块对象。

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Archerx
# @time: 2019/3/14 下午 10:20

class test:
    def __init__(self,name):
        self.name = name

    def SayHi(self):
        print('Hi ,'+ self.name)

if __name__ == '__main__':
    t = test('archerx')
    t.SayHi()
    print(t.__doc__)   #None
    print(t.__module__)   #__main__
    print(t.__dict__)     #{'name':'archerx'}
    print(t.__class__)   #<class '__main__.test'>

实例(instance)

指的是实例化后的对象

__dict__: 包含了可用的属性名-属性字典。

__class_: 该实例的类[对象]。对于类Cat,cat.__class_ == Cat 为 True。

接上面例子
print(t.__class__ == test)    #True

内建函数和方法(built-in functions and methods)

根据定义,内建的(built-in)模块是指使用C写的模块,可以通过sys模块的builtin_module_names字段查看都有哪些模块是内建的。这些模块中的函数和方法可以使用的属性比较少,不过一般也不需要在代码中查看它们的信息。

__doc__: 函数或方法的文档。

__name__: 函数或方法定义时的名字。

>>> import sys
>>> sys.builtin_module_names
('_ast', '_bisect', '_codecs', '_codecs_cn', '_codecs_hk', '_codecs_iso2022', '_codecs_jp', '_codecs_kr', '_codecs_tw', '_collections', '_csv', '_datetime', '_functools', '_heapq', '_imp', '_io', '_json', '_locale', '_lsprof', '_md5', '_multibytecodec', '_opcode', '_operator', '_pickle', '_random', '_sha1', '_sha256', '_sha512', '_sre', '_stat', '_string', '_struct', '_symtable', '_thread', '_tracemalloc', '_warnings', '_weakref', '_winapi', 'array', 'atexit', 'audioop', 'binascii', 'builtins', 'cmath', 'errno', 'faulthandler', 'gc', 'itertools', 'marshal', 'math', 'mmap', 'msvcrt', 'nt', 'parser', 'signal', 'sys', 'time', 'winreg', 'xxsubtype', 'zipimport', 'zlib')

>>> import math
>>> math.__doc__
'This module is always available.  It provides access to the\nmathematical functions defined by the C standard.'
>>> math.__name__
'math'

函数

__doc__: 函数的文档;另外也可以用属性名func_doc。

__name__: 函数定义时的函数名;另外也可以用属性名func_name。

__module__: 包含该函数定义的模块名;同样注意,是模块名而不是模块对象。

__dict__: 函数的可用属性;另外也可以用属性名func_dict。
不要忘了函数也是对象,可以使用函数.属性名访问属性(赋值时如果属性不存在将新增一个),或使用内置函数has/get/setattr()访问。不过,在函数中保存属性的意义并不大。

def test():
    n = 1
    def inner():
        print(n)
    n = 2
    return inner

closure = test()
print(dir(closure))
print(closure.__closure__)
print(dir(closure.__closure__))
print(closure.__doc__) #None
print(closure.__name__) #inner
print(closure.__module__) #__main__
print(closure.__dict__) #__main__

方法(method)

方法虽然不是函数,但可以理解为在函数外面加了一层外壳;拿到方法里实际的函数以后,既可以使用上面函数的一些属性了

callable

返回对象能否被调用,能调用就返回True不能就返回False

>>> callable(print)
True
>>> callable(type)
True
>>> callable(lambda x: x+1)
True
>>>

issubclass

>>> issubclass(bool,int)   # bool 是 int 的子类。
True

type()

对象拥有属性,并且 dir() 函数会返回这些属性的列表。但是,有时我们只想测试一个或多个属性是否存在。如果对象具有我们正在考虑的属性,那么通常希望只检索该属性。这个任务可以由 hasattr() 和 getattr() 函数来完成.

>>> dir(a)   #返回对象所有属性
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count', 'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index', 'isalnum', 'isalpha', 'isdecimal', 'isdigit', 'isidentifier', 'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate', 'upper', 'zfill']
>>> a.__doc__   #调用一下对象属性
"str(object='') -> str\nstr(bytes_or_buffer[, encoding[, errors]]) -> str\n\nCreate a new string object from the given object. If encoding or\nerrors is specified, then the object must expose a data buffer\nthat will be decoded using the given encoding and error handler.\nOtherwise, returns the result of object.__str__() (if defined)\nor repr(object).\nencoding defaults to sys.getdefaultencoding().\nerrors defaults to 'strict'."
>>> hasattr(a,'__builtins__')  #查看属性是否存在
False
>>> hasattr(a,'__doc__')   #同上
True

参考

Python 协程爬虫

前言

找到了一个表情包网站,没有任何反爬机制,网站大概有这个十八万个表情包吧,写了个普通爬虫太慢了,全完事估计需要好几个小时,于是上了协程,全部爬完40分钟,全程cpu占用100%,再重写改造一下一下就类似一个小框架了,以后有啥直接套用就好了,嘿嘿。

注:在异步程序中要注意最好全部阻塞操作都是异步,否则虽然不会报错,但会拖慢程序运行。

代码

网络IO,读写IO全部异步实现。

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Archerx
# @time: 2019/1/19 下午 03:56

import asyncio
import aiohttp
import uuid
import logging
import aiofiles


class AsyncSpider(object):
    def __init__(self,urls):
        self.URL = 'http://image.bee-ji.com/'
        self.HEADERS = {
            'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'
        }
        self.SEMAPHORENUM = 200
        self.result = {}
        self.urls = urls
        self.THREADS = 200  #并发协程数量
        self.log_level = logging.DEBUG

    async def OtherFunc(self):
        pass

    async def GetUrl(self,session,url):
        pass

    async def SaveImage(self,content,ename,):
        # with open('F:\pycharm\JsSpider\images\\'+str(uuid.uuid1())[1:6]+'.'+ename.strip().split('/')[1],'wb') as f:
        #     f.write(content)
        async with aiofiles.open('F:\images\\'+str(uuid.uuid1())[1:6]+'.'+ename.strip().split('/')[1],'wb') as f:
            await f.write(content)

    async def GetImage(self,url):
        async with aiohttp.ClientSession() as session:
            async with session.get(url=url,headers=self.HEADERS) as response:
                assert response.status == 200
                content = await response.read()
                return content,response.headers

    async def HandleTask(self,queque):
        while not queque.empty():
            url = await queque.get()
            try:
                print('start url: '+url)
                content,headers = await self.GetImage(url=url)
                await self.SaveImage(content,headers.get('Content-Type'))
                print('save successfully')
            except Exception:
                logging.error('HandleTask error',exec_info = True)


    def EventLoop(self):
        queque = asyncio.Queue()
        [queque.put_nowait(url) for url in self.urls]
        loop = asyncio.get_event_loop()
        tasks = [asyncio.ensure_future( self.HandleTask(queque=queque) )for _ in range(self.THREADS)]
        # for task in tasks:                      #可以添加回调
        #     task.add_done_callback(callback)
        loop.run_until_complete(asyncio.wait(tasks))
        loop.close()

def GenerateUrl():
    url_list = []
    for i in range(1,2000):
        url_list.append('http://image.bee-ji.com/'+str(i))
    return url_list

def callback(future):
    print(future.result())

if __name__ == '__main__':
    spider = AsyncSpider(GenerateUrl())
    logging.basicConfig(level = spider.log_level,format = '%(asctime)s - %(name)s - %(levelname)s - %(message)s')
    spider.EventLoop()

大概能达到这个效果吧:

要跟我斗图吗?我可是有十八万表情包的银。

参考

python 异步 IO:协程

注:2019.3.21补充完善

前言

所谓「异步 IO」,就是你发起一个 IO 操作,却不用等它结束,你可以继续做其他事情,当它结束时,你会得到通知。

Asyncio 是并发(concurrency)的一种方式。对 Python 来说,并发还可以通过线程(threading)和多进程(multiprocessing)来实现。

Asyncio 并不能带来真正的并行(parallelism)。当然,因为 GIL(全局解释器锁)的存在,Python 的多线程也不能带来真正的并行。

可交给 asyncio 执行的任务,称为协程(coroutine)。一个协程可以放弃执行,把机会让给其它协程(await)

运行协程

调用协程函数,协程并不会开始运行,只是返回一个协程对象

协程对象要运行有两种方式:

  • 在另一个已经运行的协程中用 await 等待它
  • 通过 ensure_future 函数计划它的执行

简单来说,只有 loop 运行了,协程才可能运行。
下面先拿到当前线程缺省的 loop ,然后把协程对象交给 loop.run_until_complete,协程对象随后会在 loop 里得到运行。

loop = asyncio.get_event_loop()
loop.run_until_complete(do_some_work(3))

是一个阻塞(blocking)调用,直到协程运行结束,它才返回。这一点从函数名不难看出。
run_until_complete 的参数是一个 future,但是我们这里传给它的却是协程对象,之所以能这样,是因为它在内部做了检查,通过 ensure_future 函数把协程对象包装(wrap)成了 future。所以,我们也可以把上面代码写得更明显一些:

loop.run_until_complete(asyncio.ensure_future(do_some_work(3)))   #手动包装一下

完整代码:

import asyncio

async def hello(x):
    print("Waiting " + str(x))
    await asyncio.sleep(x)

loop = asyncio.get_event_loop()
loop.run_until_complete(hello(3))
运行结果:

Waiting 3
<三秒后结束>

使用async可以定义协程对象,使用await可以针对耗时的操作进行挂起,就像生成器里的yield一样,函数让出控制权。协程遇到await,事件循环将会挂起该协程,执行别的协程,直到其他的协程也挂起或者执行完毕,再进行下一个协程的执行,协程的目的也是让一些耗时的操作异步化。

await后面跟的必须是一个Awaitable对象,或者实现了相应协议的对象,查看Awaitable抽象类的代码,表明了只要一个类实现了await方法,那么通过它构造出来的实例就是一个Awaitable,并且Coroutine类也继承了Awaitable。

多个协程

import asyncio
from time import strftime

async def hello():
    print(strftime('[%H:%M:%S]'), end=' ')
    print("begin")
    await asyncio.sleep(1)
    print(strftime('[%H:%M:%S]'), end=' ')
    print("end")

loop = asyncio.get_event_loop()
tasks = [hello(),hello()]
# loop.run_until_complete(asyncio.wait(tasks))
loop.run_until_complete(asyncio.gather(*tasks))

执行结果:
[15:08:30] begin
[15:08:30] begin
[15:08:31] end
[15:08:31] end
  • 两个协程并发执行所以总时间是两秒
  • gather 起聚合的作用,把多个 futures 包装成单个 future,因为 loop.run_until_complete 只接受单个 future。
  • asyncio.gather 和 asyncio.wait 功能相似。

需要注意的:

  • await语法只能出现在async修饰的函数中
  • 协程函数中,可以通过await语法来挂起自身的协程,并切换到下一个协程直到该协程返回结果。

协程中运行阻塞函数

爬虫中使用协程比较多,这里使用requesst这个阻塞模块

由于requests模块阻塞了客户代码与asycio事件循环的唯一线程,因此在执行调用时,整个应用程序都会冻结,但如果一定要用requests模块,可以使用事件循环对象的run_in_executor方法,通过run_in_executor方法来新建一个线程来执行耗时函数

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Archerx
# @time: 2019/3/21 上午 11:20

import asyncio
import requests
import functools


async def run(url):
   print("start ", url)
   loop = asyncio.get_event_loop()
   # response = await loop.run_in_executor(None, requests.get, url)
   #functools实现多参数传入
   response = await loop.run_in_executor(None, functools.partial(requests.get,url=url,params='',timeout=1))
   print(response.status_code)


url_list = ['https://blog.ixuchao.cn/archives/54.html','https://blog.ixuchao.cn/archives/55.html','https://blog.ixuchao.cn/archives/53.html']
tasks = [asyncio.ensure_future(run(url)) for url in url_list]
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait(tasks))

这个问题的解决方法是使用事件循环对象的run_in_executor方法。asyncio的事件循环在背后维护着一个ThreadPoolExecutor对象,我们可以调用run_in_executor方法,把可调用对象发给它执行,即可以通过run_in_executor方法来新建一个线程来执行耗时函数。

本质上类似于多线程跑IO密集型

run_in_executor方法参数如下:

    
AbstractEventLoop.run_in_executor(executor, func, *args)

executor 参数应该是一个 Executor 实例。如果为 None,则使用默认 executor。
func 就是要执行的函数
args 就是传递给 func 的参数

有了run_in_executor方法,就可以使用之前熟悉的模块创建协程并发了,而不需要使用特定的模块进行IO异步开发。

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Archerx
# @time: 2019/3/21 上午 11:20

import asyncio
from time import sleep, strftime
from concurrent import futures
executor = futures.ThreadPoolExecutor(max_workers=5)
async def blocked_sleep(name, t):  #函数中有await  函数必须是异步函数,加上async修饰
    print(strftime('[%H:%M:%S]'),end=' ')
    print('sleep {} is running {}s'.format(name, t))
    loop = asyncio.get_event_loop()
    await loop.run_in_executor(executor, sleep, t)  #阻塞型函数必须这样来调用
    # await asyncio.sleep(t)   #或者是这样跟上面两行一个效果,5个sleep函数并行异步执行
    print(strftime('[%H:%M:%S]'),end=' ')
    print('sleep {} is end'.format(name))
    return t



tasks = [blocked_sleep(i,i) for i in range(1,6)]
tasks = [asyncio.ensure_future(task) for task in tasks]    #手动封装成task对象,调用task.result()获取结果。
loop = asyncio.get_event_loop()
# results = loop.run_until_complete(asyncio.wait(tasks))   #使用这个和下面代码功能类似
results = loop.run_until_complete(asyncio.gather(*tasks))


print('results: {}'.format(results))   #结果存入一个list



输出结果是

[19:49:32] sleep 3 is running 3s
[19:49:32] sleep 4 is running 4s
[19:49:32] sleep 1 is running 1s
[19:49:32] sleep 5 is running 5s
[19:49:32] sleep 2 is running 2s
[19:49:33] sleep 1 is end
[19:49:34] sleep 2 is end
[19:49:35] sleep 3 is end
[19:49:36] sleep 4 is end
[19:49:37] sleep 5 is end
result: [1, 2, 3, 4, 5]

tasks = (blocked_sleep(i, i) for i in range(1,6))产生一个生成器表达式,每个元素都是一个协程。我们将future传递给gather函数。

对于gather函数的使用方法如下:

asyncio.gather(*coros_or_futures, loop=None, return_exceptions=False)

gather返回一个包含task对象结果的list

  • 在asyncio中调用阻塞函数时,需要使用asyncio维护的线程池来另开线程运行阻塞函数,防止阻塞事件循环所在的线程。
函数传参返回值返回值顺序
asyncio.gather可以传递多个协程或者Futures,函数会自动将协程包装成task(也可以手动包装),例如协程生成器。包含Futures结果的list按照原顺序排列
asyncio.waita list of futures返回两个Future集合 (done, pending)无序(暂定)
asyncio.as_completeda list of futures返回一个协程迭代器按照完成顺序

获取协程结果

直接输出函数返回结果

import asyncio
async def test1():
    print("1")
    print("2")
    return "stop"

loop = asyncio.get_event_loop()
task = asyncio.ensure_future(test1())
loop.run_until_complete(task)
print(task.result())

由于协程对象不能直接运行,在注册事件循环的时候,其实是run_until_complete方法将协程包装成为了一个任务(task)对象。所谓task对象是Future类的子类,保存了协程运行后的状态,用于未来获取协程的结果。我们也可以手动将协程对象定义成task,使用task = loop.create_task(test1())

等待task状态为finish,然后调用result方法获取返回值。

回调函数

#! /usr/bin/env python
# -*- coding: utf-8 -*-
# Author: Archerx
# @time: 2019/3/21 上午 11:20

import asyncio
import functools

async def test1():
    print("1")
    print("2")
    return "stop"


def callback(param1,param2,future):  #回调函数中的future对象其实就是task对象
    print(param1,param2)
    print('CallBack:',future.result())   #future对象的result方法获取函数返回值


loop = asyncio.get_event_loop()
task = asyncio.ensure_future(test1())   #协程对象转换成task
# task.add_done_callback(callback)    #task绑定回调函数
task.add_done_callback(functools.partial(callback,"param1","param2"))  #回调函数如果需要接收多个参数,用偏函数导入
loop.run_until_complete(task)

future对象有几个状态:Pending、Running、Done、Cancelled。创建future的时候,task为pending,事件循环调用执行的时候当然就是running,调用完毕自然就是done,如果需要停止事件循环,就需要先把task取消,可以使用asyncio.Task获取事件循环的task。

整理一下概念和方法

  • event_loop事件循环:程序开启一个无限的循环,当把一些函数注册到事件循环上时,满足事件发生条件即调用相应的函数。
  • coroutine协程对象:指一个使用async关键字定义的函数,它的调用不会立即执行函数,而是会返回一个协程对象,协程对象需要注册到事件循环,由事件循环调用。
  • task任务:一个协程对象就是一个原生可以挂起的函数,任务则是对协程进一步封装,其中包含任务的各种状态。
  • future:代表将来执行或没有执行的任务的结果,它和task上没有本质的区别
  • async/await关键字:python3.5用于定义协程的关键字,async定义一个协程,await用于挂起阻塞的异步调用接口。

参考

preView