从源码分析类的加载过程
在分析类的加载过程之前,我们先了解一下应用的加载过程。
其中dyld和objc相互配合,动态链接器dyld在加载动态库的过程中初始化objc,objc在初始化过程中注册回调函数_dyld_objc_notify_register,通知dyld执行map_image\load_image\unmap_image。
从截图中我们可以看到类的整个加载流程是从dyld-->libSystem-->libdispatch-->objc中的_objc_int,而在_objc_init中又调用了_dyld_objc_notify_register注册回调函数。
下面我们结合源码分析一下类的加载过程。
/************************************************************************ _objc_init* Bootstrap initialization. Registers our image notifier with dyld.* Called by libSystem BEFORE library initialization time**********************************************************************/void _objc_init(void){static bool initialized = false;if (initialized) return;initialized = true;// fixme defer initialization until an objc-using image is found?environ_init(); //!! 初始化一系列环境变量并读取影响运行时的环境变量tls_init(); //!! 处理线程key的绑定static_init(); //!! 运行C++静态构造函数lock_init(); //!! 空函数exception_init(); //!! 初始化libobjc异常处理系统_dyld_objc_notify_register(&map_images, load_images, unmap_image); //!! dyld回调函数}
environ_init()
environ_init方法主要是初始化一系列环境变量,并读取影响运行时的环境变量。
/************************************************************************ environ_init* Read environment variables that affect the runtime.* Also print environment variable help, if requested.**********************************************************************/void environ_init(void){if (issetugid()) {// All environment variables are silently ignored when setuid or setgid// This includes OBJC_HELP and OBJC_PRINT_OPTIONS themselves.return;}bool PrintHelp = false;bool PrintOptions = false;bool maybeMallocDebugging = false;// Scan environ[] directly instead of calling getenv() a lot.// This optimizes the case where none are set.for (char **p = *_NSGetEnviron(); *p != nil; p++) {...}// Special case: enable some autorelease pool debugging// when some malloc debugging is enabled// and OBJC_DEBUG_POOL_ALLOCATION is not set to something other than NO.if (maybeMallocDebugging) {...}// Print OBJC_HELP and OBJC_PRINT_OPTIONS output.if (PrintHelp || PrintOptions) {...}}
从上面的源码中我们可以通过运行查看影响运行时的环境变量或者通过终端输入export OBJC_HELP=1查看,下面截图只是部分,读者可以自行打印查看。
tls_init()
tls_init方法主要是处理线程key的绑定,处理每个线程数据的析构函数。
-
static_init() staic_init方法是运行C++静态构造函数
/************************************************************************ static_init* Run C++ static constructor functions.* libc calls _objc_init() before dyld would call our static constructors,libc在调用dyld的_dyld_objc_notify_register函数之前调用* so we have to do it ourselves.**********************************************************************/static void static_init(){size_t count;auto inits = getLibobjcInitializers(&_mh_dylib_header, &count);for (size_t i = 0; i < count; i++) {inits[i]();}}
static_init方法会运行系统级别的C++静态构造函数,从注释中可知在dyld调用我们的静态构造函数之前调用,也就是说我们系统级别的c++构造函数的优先级高于我们自定义的c++构造函数。
lock_init()
lock_init方法是一个空函数,OC中锁机制是采用c和c++那一套。
exception_init()
exception_init方法是初始化libobjc的异常处理系统,注册异常处理的回调,从而监控异常的处理。
/************************************************************************ exception_init* Initialize libobjc's exception handling system.* Called by map_images().**********************************************************************/void exception_init(void){old_terminate = std::set_terminate(&_objc_terminate);}
当我们调用只声明不做实现的方法会报错,就会来到函数_objc_terminate。
static void (*old_terminate)(void) = nil;static void _objc_terminate(void){if (PrintExceptions) {_objc_inform("EXCEPTIONS: terminating");}if (! __cxa_current_exception_type()) {// No current exception.(*old_terminate)();}else {// There is a current exception. Check if it's an objc exception.@try {__cxa_rethrow();} @catch (id e) {// It's an objc object. Call Foundation's handler, if any.(*uncaught_handler)((id)e);(*old_terminate)();} @catch (...) {// It's not an objc object. Continue to C++ terminate.(*old_terminate)();}}}
_dyld_objc_notify_register()
//// Note: only for use by objc runtime// Register handlers to be called when objc images are mapped, unmapped, and initialized.// Dyld will call back the "mapped" function with an array of images that contain an objc-image-info section.// Those images that are dylibs will have the ref-counts automatically bumped, so objc will no longer need to// call dlopen() on them to keep them from being unloaded. During the call to _dyld_objc_notify_register(),// dyld will call the "mapped" function with already loaded objc images. During any later dlopen() call,// dyld will also call the "mapped" function. Dyld will call the "init" function when dyld would be called// initializers in that image. This is when objc calls any +load methods in that image.//void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,_dyld_objc_notify_init init,_dyld_objc_notify_unmapped unmapped);
从上面源码截图的注释中可知_dyld_objc_notify_register方法仅供objc运行时使用,当objc的镜像被映射,取消映射和初始化的时候注册的回调函数会被调用。这个方法是 dyld库中声明的,一旦调用该方法,调用结果会作为该函数的参数回传回来。load方法也将在这个方法中被调用。
接下来我们看一下这个方法的三个入参,map_images、load_images、unmap_images;
typedef void (*_dyld_objc_notify_mapped)(unsigned count, const char* const paths[], const struct mach_header* const mh[]);typedef void (*_dyld_objc_notify_init)(const char* path, const struct mach_header* mh);typedef void (*_dyld_objc_notify_unmapped)(const char* path, const struct mach_header* mh);
这三个函数指针都是在dyld库中回调,我们可以从源码中查看。
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,_dyld_objc_notify_init init,_dyld_objc_notify_unmapped unmapped){dyld::registerObjCNotifiers(mapped, init, unmapped);}void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped){// record functions to callsNotifyObjCMapped = mapped;sNotifyObjCInit = init;sNotifyObjCUnmapped = unmapped;// call 'mapped' function with all images mapped so fartry {notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);}catch (const char* msg) {// ignore request to abort during registration}// <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {ImageLoader* image = *it;if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());}}}static _dyld_objc_notify_mapped sNotifyObjCMapped;static _dyld_objc_notify_init sNotifyObjCInit;static _dyld_objc_notify_unmapped sNotifyObjCUnmapped;
从上图中可以看出从libobjc传过来的三个函数指针被保存在dyld库的三个本地静态变量中sNotifyObjCMapped、sNOtifyObjCInit、sNotifyObjCUnmapped,从截图中可知调用notifyBatchPartial()方法来映射所有的镜像。
从notifyBatchPartial函数内部实现来看会通过sNotifyObjCMapped函数指针调用告诉objc镜像全部映射完成。而三个函数指针内部具体做了什么,我们继续探究。
当dyld将镜像加载到内存的时候会触发该函数。
/************************************************************************ map_images* Process the given images which are being mapped in by dyld.* Calls ABI-agnostic code after taking ABI-specific locks.* Locking: write-locks runtimeLock**********************************************************************/voidmap_images(unsigned count, const char * const paths[],const struct mach_header * const mhdrs[]){mutex_locker_t lock(runtimeLock);return map_images_nolock(count, paths, mhdrs);}
在map_images函数中会调用map_images_nolock函数,其中hCount是镜像文件个数,调用_read_images()来加载镜像文件,执行所有类的注册和修复功能。
voidmap_images_nolock(unsigned mhCount, const char * const mhPaths[],const struct mach_header * const mhdrs[]){.....(省略)if (hCount > 0) {_read_images(hList, hCount, totalClasses, unoptimizedTotalClasses);}}
而_read_images()是我们关注的重点。下面我们来看一下_read_images()函数的调用。
由于_read_images函数中代码很多,我们分步来探究其具体实现,
1、doneOnce流程:
if (!doneOnce) {doneOnce = YES;.......if (DisableTaggedPointers) {disableTaggedPointers();}initializeTaggedPointerObfuscator();if (PrintConnecting) {_objc_inform("CLASS: found %d classes during launch", totalClasses);}// namedClasses// Preoptimized classes don't go in this table.// 4/3 is NXMapTable's load factorint namedClassesSize =(isPreoptimized() ? unoptimizedTotalClasses : totalClasses) * 4 / 3;gdb_objc_realized_classes =NXCreateMapTable(NXStrValueMapPrototype, namedClassesSize);allocatedClasses = NXCreateHashTable(NXPtrPrototype, 0, nil);ts.log("IMAGE TIMES: first time tasks");}/************************************************************************ getClass* Looks up a class by name. The class MIGHT NOT be realized.* Demangled Swift names are recognized.* Locking: runtimeLock must be read- or write-locked by the caller.**********************************************************************/// This is a misnomer: gdb_objc_realized_classes is actually a list of// named classes not in the dyld shared cache, whether realized or not.NXMapTable *gdb_objc_realized_classes; // exported for debuggers in objc-gdb.h/************************************************************************ allocatedClasses* A table of all classes (and metaclasses) which have been allocated* with objc_allocateClassPair.**********************************************************************/static NXHashTable *allocatedClasses = nil;
通过doneOnce流程会创建两个表gdb_objc_realized_classes、allocatedClasses,其中gdb_objc_realized_classes存储不再共享缓存且已经命名的所有类,其容量是类数量的4/3,allocatedClasses是存储已经初始化的类。
2、 类的重映射
// Discover classes. Fix up unresolved future classes. Mark bundle classes.for (EACH_HEADER) {classref_t *classlist = _getObjc2ClassList(hi, &count); //!! 从编译后的类列表中取出所有类,获取到的是一个classref_t类型的指针if (! mustReadClasses(hi)) {// Image is sufficiently optimized that we need not call readClass()continue;}bool headerIsBundle = hi->isBundle();bool headerIsPreoptimized = hi->isPreoptimized();for (i = 0; i < count; i++) {Class cls = (Class)classlist[i];Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);//!! 通过readclass获取处理后的新类if (newCls != cls && newCls) {// Class was moved but not deleted. Currently this occurs// only when the new class resolved a future class.// Non-lazily realize the class below.resolvedFutureClasses = (Class *)realloc(resolvedFutureClasses,(resolvedFutureClassCount+1) * sizeof(Class));resolvedFutureClasses[resolvedFutureClassCount++] = newCls;}}}
readclass会返回新处理的类,我们查看readclass()方法的实现
/************************************************************************ readClass* Read a class and metaclass as written by a compiler.* Returns the new class pointer. This could be:* - cls* - nil (cls has a missing weak-linked superclass)* - something else (space for this class was reserved by a future class)** Note that all work performed by this function is preflighted by* mustReadClasses(). Do not change this function without updating that one.** Locking: runtimeLock acquired by map_images or objc_readClassPair**********************************************************************/Class readClass(Class cls, bool headerIsBundle, bool headerIsPreoptimized){.......Class replacing = nil;if (Class newCls = popFutureNamedClass(mangledName)) {// This name was previously allocated as a future class.// Copy objc_class to future class's struct.// Preserve future's rw data block.if (newCls->isAnySwift()) {_objc_fatal("Can't complete future class request for '%s' ""because the real class is too big.",cls->nameForLogging());}class_rw_t *rw = newCls->data();const class_ro_t *old_ro = rw->ro;memcpy(newCls, cls, sizeof(objc_class));rw->ro = (class_ro_t *)newCls->data();newCls->setData(rw);freeIfMutable((char *)old_ro->name);free((void *)old_ro);addRemappedClass(cls, newCls);replacing = cls;cls = newCls;}if (headerIsPreoptimized && !replacing) {// class list built in shared cache// fixme strict assert doesn't work because of duplicates// assert(cls == getClass(name));assert(getClass(mangledName));} else {addNamedClass(cls, mangledName, replacing);addClassTableEntry(cls);}// for future reference: shared cache never contains MH_BUNDLEsif (headerIsBundle) {cls->data()->flags |= RO_FROM_BUNDLE;cls->ISA()->data()->flags |= RO_FROM_BUNDLE;}return cls;}
运行源码断点查看进入readclass()后会调用addNamedClass()和addClassTableEntry(),而addNamedClass内部是将cls插入到gdb_objc_realized_classes表中。
/************************************************************************ addNamedClass* Adds name => cls to the named non-meta class map.* Warns about duplicate class names and keeps the old mapping.* Locking: runtimeLock must be held by the caller**********************************************************************/static void addNamedClass(Class cls, const char *name, Class replacing = nil){runtimeLock.assertLocked();Class old;if ((old = getClass(name)) && old != replacing) {inform_duplicate(name, old, cls);// getNonMetaClass uses name lookups. Classes not found by name// lookup must be in the secondary meta->nonmeta table.addNonMetaClass(cls);} else {NXMapInsert(gdb_objc_realized_classes, name, cls);}assert(!(cls->data()->flags & RO_META));// wrong: constructed classes are already realized when they get here// assert(!cls->isRealized());}
addClassTableEntry内部是将cls插入到allocatedClasses表中
/************************************************************************ addClassTableEntry* Add a class to the table of all classes. If addMeta is true,* automatically adds the metaclass of the class as well.* Locking: runtimeLock must be held by the caller.**********************************************************************/static void addClassTableEntry(Class cls, bool addMeta = true) {runtimeLock.assertLocked();// This class is allowed to be a known class via the shared cache or via// data segments, but it is not allowed to be in the dynamic table already.assert(!NXHashMember(allocatedClasses, cls));if (!isKnownClass(cls))NXHashInsert(allocatedClasses, cls);if (addMeta)addClassTableEntry(cls->ISA(), false);}
readClass中的调用通过popFutureNamedClass判断是否是后期要处理的类,如果是的话,就取出后期处理的类,读取这个类的data类设置ro/rw相关信息;addNamedClass插入总表,addClassTableEntry插入已开辟内存的类的表。
3、 修复重映射
将未映射的Class和supper Class重映射,调用_getObjc2ClassRefs()获取类的引用,_getObjc2SuperRefs()获取父类的引用,然后通过remapClasRef()进行重映射。
// Fix up remapped classes// Class list and nonlazy class list remain unremapped.// Class refs and super refs are remapped for message dispatching.if (!noClassesRemapped()) {for (EACH_HEADER) {Class *classrefs = _getObjc2ClassRefs(hi, &count);for (i = 0; i < count; i++) {remapClassRef(&classrefs[i]);}// fixme why doesn't test future1 catch the absence of this?classrefs = _getObjc2SuperRefs(hi, &count);for (i = 0; i < count; i++) {remapClassRef(&classrefs[i]);}}}/************************************************************************ remapClassRef* Fix up a class ref, in case the class referenced has been reallocated* or is an ignored weak-linked class.* Locking: runtimeLock must be read- or write-locked by the caller**********************************************************************/static void remapClassRef(Class *clsref){runtimeLock.assertLocked();Class newcls = remapClass(*clsref);if (*clsref != newcls) *clsref = newcls;}
4、 添加SEL到namedSelectors表
// Fix up @selector referencesstatic size_t UnfixedSelectors;{mutex_locker_t lock(selLock);for (EACH_HEADER) {if (hi->isPreoptimized()) continue;bool isBundle = hi->isBundle();SEL *sels = _getObjc2SelectorRefs(hi, &count);UnfixedSelectors += count;for (i = 0; i < count; i++) {const char *name = sel_cname(sels[i]);sels[i] = sel_registerNameNoLock(name, isBundle);}}}
修正SEL的引用,在调用前加入一个selLock锁,然后遍历EACH_HEADER,如果开启了预优化,就继续下一个可执行文件,通过_getObjc2SelectorRefs()获取所有的sel引用,然后对所有的sel引用调用sel_registerNameNoLock()注册,注册最终调用的函数是__sel_registerName()。
SEL sel_registerNameNoLock(const char *name, bool copy) {return __sel_registerName(name, 0, copy); // NO lock, maybe copy}static SEL __sel_registerName(const char *name, bool shouldLock, bool copy){SEL result = 0;if (shouldLock) selLock.assertUnlocked();else selLock.assertLocked();if (!name) return (SEL)0;result = search_builtins(name);if (result) return result;conditional_mutex_locker_t lock(selLock, shouldLock);if (namedSelectors) {result = (SEL)NXMapGet(namedSelectors, name);}if (result) return result;// No match. Insert.if (!namedSelectors) {namedSelectors = NXCreateMapTable(NXStrValueMapPrototype,(unsigned)SelrefCount);}if (!result) {result = sel_alloc(name, copy);// fixme choose a better container (hash not map for starters)NXMapInsert(namedSelectors, sel_getName(result), result);}return result;}
__sel_registerName()中的调用方法流程是:
1)判断是否要加锁
2)如果 Sel为空,则返回一个空的 sel
3)从 search_builtins 中搜索,看是否已经注册过,如果找到,直接返回结果
4)从 namedSelectors 哈希表中查询,找到了就返回结果
如果 namedSelectors 未初始化,则创建一下这个哈希表
5)如果上面的流程都没有找到,则需要调用 sel_alloc 来创建一下 SEL ,然后把新创建的 SEL 插入哈希表中进行缓存的填充。
5、 修复旧的函数指针调用遗留
// Fix up old objc_msgSend_fixup call sitesfor (EACH_HEADER) {message_ref_t *refs = _getObjc2MessageRefs(hi, &count);if (count == 0) continue;if (PrintVtables) {_objc_inform("VTABLES: repairing %zu unsupported vtable dispatch ""call sites in %s", count, hi->fname());}for (i = 0; i < count; i++) {fixupMessageRef(refs+i);}}ts.log("IMAGE TIMES: fix up objc_msgSend_fixup");
这个函数调用前提是FIXUP开启,其流程是遍历EACH_HEADER,通过_getObjc2MessageRefs()获取到当前遍历到的镜像文件的所有消息引用,然后遍历这些消息引用,然后调用fixupMessageRef进行修复。
6、 添加protocol到协议列表
Discover protocols. Fix up protocol refs.for (EACH_HEADER) {extern objc_class OBJC_CLASS_$_Protocol;Class cls = (Class)&OBJC_CLASS_$_Protocol;assert(cls);NXMapTable *protocol_map = protocols();bool isPreoptimized = hi->isPreoptimized();bool isBundle = hi->isBundle();protocol_t **protolist = _getObjc2ProtocolList(hi, &count);for (i = 0; i < count; i++) {cls, protocol_map,isBundle);}}
其函数调用流程和上面流程调用相似,通过_getObjc2ProtocolList()获取到当前遍历的协议列表,然后通过readProtocol()添加protocol到protocol_map哈希表。
7、 修复协议列表引用
// Fix up @protocol references// Preoptimized images may have the right// answer already but we don't know for sure.for (EACH_HEADER) {protocol_t **protolist = _getObjc2ProtocolRefs(hi, &count);for (i = 0; i < count; i++) {remapProtocolRef(&protolist[i]);}}
8、 实现非懒加载的类
// Realize non-lazy classes (for +load methods and static instances)for (EACH_HEADER) {classref_t *classlist =_getObjc2NonlazyClassList(hi, &count);for (i = 0; i < count; i++) {Class cls = remapClass(classlist[i]);if (!cls) continue;// hack for class __ARCLite__, which didn't get this aboveif (cls->cache._buckets == (void*)&_objc_empty_cache &&(cls->cache._mask || cls->cache._occupied)){cls->cache._mask = 0;cls->cache._occupied = 0;}if (cls->ISA()->cache._buckets == (void*)&_objc_empty_cache &&(cls->ISA()->cache._mask || cls->ISA()->cache._occupied)){cls->ISA()->cache._mask = 0;cls->ISA()->cache._occupied = 0;}addClassTableEntry(cls);realizeClass(cls);}}
实现了+load方法的类是懒加载类,通过_getObjc2NonLazyClassList()方法获取到__objc_nlclslist,取出非懒加载类,addClassTableEntry()加载,如果已添加不会再添加,确保整个结构添加。realizeClass()实现所有非懒加载类,实例化类对象的一些信息。
9、 初始化新解析出来的future类
// Realize newly-resolved future classes, in case CF manipulates themif (resolvedFutureClasses) {for (i = 0; i < resolvedFutureClassCount; i++) {realizeClass(resolvedFutureClasses[i]);resolvedFutureClasses[i]->setInstancesRequireRawIsa(false/*inherited*/);}free(resolvedFutureClasses);}
10、 处理所有的分类包括类和元类
至此_read_images()流程分析完毕,其具体流程如下图:
接下来我们来分析_dyld_objc_notify_register的第二个参数load_images,load_images是在什么时候调用呢? 我们查看dyld源码中搜索对应的函数指针sNotifyObjCInit可知在notifySingle内部该函数指针被调用。_load_images是对每一个加载进来的可执行文件镜像都会递归调用一次。
/************************************************************************ load_images* Process +load in the given images which are being mapped in by dyld.** Locking: write-locks runtimeLock and loadMethodLock**********************************************************************/extern bool hasLoadMethods(const headerType *mhdr);extern void prepare_load_methods(const headerType *mhdr);voidload_images(const char *path __unused, const struct mach_header *mh){// Return without taking locks if there are no +load methods here.if (!hasLoadMethods((const headerType *)mh)) return;recursive_mutex_locker_t lock(loadMethodLock);// Discover load methods{mutex_locker_t lock2(runtimeLock);prepare_load_methods((const headerType *)mh);}// Call +load methods (without runtimeLock - re-entrant)call_load_methods();}
处理dyld映射的给定镜像中的load方法,判断是否有load方法,如果没有直接返回,通过prepare_load_methods探索load方法将每个动态库中要执行load方法的类插入到一个表中,再通过call_load_methods方法调用load方法,call_load_methods方法中调用类和分类的load方法,类里面的load方法是父类优先调用,之后调用分类的load方法。
到这里,从_objc_init到_dyld_objc_notify_register的过程我们分析完成,其中的源码都是关键部分的截图,大家有时间可以自行下载查看其源码的具体实现。通过了解应用的加载过程及类的加载过程及,我们可以了解main()函数调用之前的加载过程,对于客户端优化启动速度有一定帮助。其类加载的大致流程图如下:
1、https://opensource.apple.com/tarballs/dyld/dyld-832.7.3.tar.gz
2、https://opensource.apple.com/tarballs/objc4/
3、https://stackoverflow.com/questions/39863112/what-is-required-for-a-mach-o-executable-to-load
