Below the hood of Wslink’s multilayered digital machine

ESET researchers describe the construction of the digital machine utilized in samples of Wslink and counsel a doable method to see by way of its obfuscation methods

ESET researchers not too long ago described Wslink, a singular and beforehand undocumented malicious loader that runs as a server and that incorporates a virtual-machine-based obfuscator. There are not any code, performance or operational similarities that counsel that is prone to be a software from a recognized menace actor.

In our white paper, linked beneath, we describe the construction of the digital machine utilized in samples of Wslink and counsel a doable method to see by way of the obfuscation methods used within the analyzed samples. We display our method on chunks of code of the protected pattern. We weren’t motivated to completely deobfuscate the code, as a result of we found a non-obfuscated pattern.

Obfuscation methods are a sort of software program safety supposed to make code arduous to grasp and therefore conceal its aims; obfuscating digital machine methods have develop into extensively misused for illicit functions akin to obfuscation of malware samples, since they hinder each evaluation and detection. The flexibility to investigate malicious code and subsequently enhance our detection capabilities is the driving pressure behind our motivation to beat these methods.

Virtualized Wslink samples don’t comprise any clear artifacts, akin to particular part names, that simply hyperlink it to a recognized virtualization obfuscator. Throughout our analysis, we have been capable of efficiently design and implement a semiautomatic resolution able to considerably facilitating evaluation of the underlying program’s code.

This digital machine launched a various arsenal of obfuscation methods, which we have been capable of overcome to disclose part of the deobfuscated malicious code that we describe on this blogpost. Within the final sections of our white paper, we current components of the code we developed to facilitate our analysis.

Our white paper additionally offers an summary of the inner construction of digital machines generally, and introduces some essential phrases and frameworks utilized in our detailed evaluation of the Wslink digital machine.

In an earlier white paper, we described the construction of a customized digital machine, together with our methods to devirtualize the machine. That digital machine contained an attention-grabbing anti-disassembly trick, beforehand utilized by FinFisher – spyware and adware with intensive spying capabilities, akin to stay surveillance by way of webcams and microphones, keylogging, and exfiltration of recordsdata. We moreover offered an method for its deobfuscation.

This blogpost consists of excerpts from the Below the hood of Wslink’s multilayered digital machine white paper; we encourage everybody fascinated with digital machines and obfuscation methods to undergo the unique white paper, because it incorporates detailed data on varied steps required to see by way of the obfuscation methods utilized in Wslink.

Overview of digital machine constructions

Earlier than diving into the evaluation of Wslink’s digital machine (VM), we offer an summary of the inner construction of digital machines generally, describe recognized approaches to cope with such obfuscation, and introduce some essential phrases and frameworks utilized in our detailed evaluation of the Wslink VM.

Basic construction of digital machines

Digital machines might be divided into two most important classes:

  1. System digital machines – assist execution of full working programs (e.g., varied VMWare merchandise, VirtualBox)
  2. Course of digital machines – execute particular person packages in an OS-independent surroundings (e.g., Java, the .NET Frequent Language Runtime)

Right here, we have an interest solely within the second class – course of digital machines – and we are going to briefly describe sure components of their inner anatomy crucial to grasp the remainder of this paper.

Course of digital machines run as regular functions on their host OSes, and in flip run packages whose code is saved as OS-independent bytecode (Determine 1) that represents a collection of directions – an software – of a digital instruction set structure (ISA).

Determine 1. Illustration of bytecode, the place all opcodes and operands are digital

One also can take into consideration bytecode as a kind of intermediate illustration (IR); an summary illustration of code consisting of a selected instruction set that resembles meeting greater than a high-level language. It is usually often called intermediate language.

The usage of IR is handy by way of code reusability – when one wants so as to add assist for a brand new structure or CPU instruction set, it’s simpler to transform it to the IR as a substitute of writing all of the required algorithms once more. One other profit is that it will probably simplify the appliance of some optimization algorithms.

One can typically translate each high- and low-level languages into an IR. Translation of a higher-level language is named “decreasing”, and equally translation of a lower-level one, “lifting”.

The next instance lifts an meeting block bb0 right into a block with the pseudo-IR code irb0. All meeting directions are translated right into a set of IR operations and particular person operations in units don’t have an effect on one another, the place ZF stands for zero flag and CF for carry flag:

  MOV R8, 0x05
  R8 = 0x05

  EAX[:0x10] = EAX[:0x10] – EDX[:0x10]
  ZF = EAX[:0x10] – EDX[:0x10] == 0x00
  CF = EAX[:0x10] < EDX[:0x10]


Trendy course of VMs often present a compiler that may decrease code written in a high-level language — one that’s simple to grasp and comfy to make use of – into the respective bytecode.

A VM’s ISA typically defines the supported directions, information varieties and registers, amongst different issues, that naturally have to be applied by a digital ISA as nicely.

Directions encompass the next components:

  • opcodes – operation codes that specify an instruction
  • operands – parameters of the directions

ISAs typically use two well-known digital registers:

  • digital program counter (VPC) – a pointer to the present place within the bytecode
  • digital stack pointer – a pointer to pre-allocated digital stack area used internally by the VM

The digital stack pointer doesn’t should be current in all VMs; it is not uncommon solely in a sure sort of VM – stack-based ones.

We’ll confer with the directions and their respective components of a digital ISA merely as digital directions, digital opcodes, and digital operands. We generally omit the express use of “digital” when it’s apparent that we’re speaking in regards to the digital illustration.

An OS-dependent (Determine 2) executable file – interpreter – processes the provided bytecode and sequentially interprets the underlying digital directions thus executing the virtualized program.

Determine 2. Illustration of the connection between bytecode and the VM’s interpreter

Switch of management from one digital instruction to the subsequent throughout interpretation must be carried out by each VM. This course of is commonly known as dispatching. There are a number of documented dispatch methods akin to:

  • Swap Dispatch – the best dispatch mechanism the place digital directions are outlined as case clauses and a digital opcode is used because the take a look at expression (Determine 3)
  • Direct Name Threading – digital directions are outlined as capabilities and digital opcodes comprise addresses of those capabilities
  • Direct Threading – digital directions are outlined as capabilities once more; nonetheless, compared to Direct Name Threading, addresses of the capabilities are saved in a desk and digital opcodes signify offsets to this desk. Every operate ought to not directly name the next one based on the specification (Determine 4)

The physique of a digital opcode within the interpreter’s code is often referred to as a digital handler as a result of it defines the habits of the opcode and handles it when the digital program counter factors to a location within the bytecode that incorporates a digital instruction with that opcode.

By context, concerning VMs, we imply a kind of digital process context: every time a course of is faraway from entry to the processor throughout course of switching, enough data on its present working state – its context – have to be saved such that when it’s once more scheduled to run on the processor, it will probably resume its operation from an similar place.

Determine 3. Illustration of Swap Dispatch, the place R0 is a digital register

Determine 4. Illustration of Direct Threading

Obfuscation methods are a sort of software program safety supposed to make code arduous to grasp and therefore conceal its aims. Such methods have been initially developed to guard the mental property of reputable software program, e.g., to hamper reverse engineering.

Digital machines used as obfuscation engines are primarily based on course of digital machines, as described above. The first distinction is that they don’t seem to be supposed to run cross-platform functions they usually often take machine code compiled or assembled for a recognized ISA, disassemble it, and translate that to their very own digital ISA. It is usually often the case that the VM surroundings and the virtualized software code are contained multi functional software, whereas conventional course of VMs often encompass a course of that runs as a standalone software that hundreds separate, virtualized functions

The power of this obfuscation approach resides in the truth that the ISA of the VM is unknown to any potential reverse engineer – an intensive evaluation of the VM, which might be very time-consuming, is required to grasp the which means of the digital directions and different constructions of the VM. Additional, if efficiency is just not a problem, the VM’s ISA might be designed to be arbitrarily advanced, slowing its execution of virtualized functions, however making reverse engineering much more advanced. Understanding of the VM is critical for decoding the bytecode and making the virtualized code comprehensible.

Context has a little bit of a unique which means in regard to obfuscating digital machines: every time we need to change from the native to digital ISA or vice-versa, enough data – context – on the present working state have to be saved in order that when the lSA needs to be switched again, execution can resume with solely the related information and registers modified.

Moreover, obfuscating VMs often virtualize solely sure “attention-grabbing” capabilities – native context is mapped to the digital one and bytecode, representing the respective operate, is chosen beforehand. The built-in interpreter is invoked afterwards (Determine 5). Beginnings of the unique capabilities comprise code that prepares and executes the interpreter – entry of the VM (vm_entry); the remainder of their code is omitted in Determine 5.

Interpreter, bytecode, and digital ISA code with information of obfuscating VMs are sometimes all saved in a devoted part of the executable binary, together with the remainder of the partially virtualized program.

Determine 5 exhibits the best way a operate, Perform 1, within the unique software concentrating on a typical ISA, might be virtualized for an obfuscating VM’s ISA. It must be transformed into bytecode, for instance utilizing a generate_bytecode methodology. Its physique is afterwards overwritten by a name into vm_entry and zeroes. The vm_entry operate chooses the respective bytecode, for instance, primarily based on the calling operate’s handle, then conducts a context change, and subsequent interprets the bytecode. Lastly, it returns to the code the place the virtualized operate, Perform 1, would return.

Determine 5. Overview of the virtualization course of

In VMs hosted on x86 architectures, such context switches often encompass a collection of PUSH and POP directions. For instance:


MOV ECX, context_addr

When the bytecode is absolutely processed, digital context is mapped again to native context and execution continues within the non-virtualized code; nonetheless, one other virtualized operate may very well be executed in the identical method, instantly.

Observe that a number of context switches can happen in a single virtualized operate, for instance when a local instruction from the unique ISA couldn’t be translated to digital directions or an unknown operate from the native API must be executed.

Wslink’s digital machine entry – vm_entry

Let’s get to the evaluation of Wslink’s VM now. There are a number of operate calls that enter the VM, all of that are adopted by some gibberish information that IDA makes an attempt to disassemble – the info most certainly simply overwrites the operate’s unique code earlier than virtualization (Determine 6).

Determine 6. Entry level to the digital machine

The vm_entry of the VM:

  • calculates the precise base handle by subtracting the anticipated relative digital handle from the precise digital handle of a spot within the code
  • unpacks code and information associated to the VM on the primary run; it makes use of the calculated base handle to find out the situation of the packed VM and vacation spot of the unpacked information
  • executes an initialization operate – one of many vm_pre_init() capabilities to be described is predicated on the caller’s relative handle that’s mapped to the respective vm_pre_init()


Wslink’s VM is filled with NsPack to cut back the scale of the massive executable file; further obfuscation might be only a facet impact. Similarities between Wslink’s unpacking code and ClamAV’s unspack() operate are clearly seen (Determine 7 and Determine 8). Observe that Ghidra has optimized out calculation of the bottom handle.

Determine 7. Part of vm_entry of the digital machine decompiled with Ghidra

Determine 8. Perform used to unpack NsPack in ClamAV

The vm_pre_init_dispatch_table in Determine 7 is the construction that maps callers’ addresses of the vm_entry to the respective vm_pre_init() capabilities which can be to be described.

Digital machine initialization

Initialization of the VM consists of a number of steps, akin to saving values of the native registers on the stack and later transferring them to the digital context, relocation of its inner constructions, or preparation of bytecode. We cowl these steps extra totally within the following subsections.

vm_pre_init() capabilities

vm_pre_init() capabilities are meant solely to organize parameters for one more stage of initialization (Determine 9). These capabilities name a single vm_init() operate (defined within the subsequent part) with particular parameters. The provided parameters are:

  • CPU flags, that are saved on the stack with a PUSHF instruction at the start of every operate
  • hardcoded offset to a digital instruction desk that represents the primary digital instruction to be executed (its opcode)
  • hardcoded handle of the bytecode to be interpreted

Determine 9. Miasm’s symbolic execution of a vm_pre_init() displaying parameters provided to vm_init()

vm_init() operate

vm_init() pushes all of the native registers and the provided CPU flags from parameters (context) onto the stack. The native context will later be moved to the digital one which, as well as, holds a number of inner registers.

One of many inner registers determines whether or not one other occasion of the VM is already working – there is just one international digital context and just one occasion of the VM can run at a time. Determine 10 exhibits the a part of the code busy-waiting for the digital register, the place RBP incorporates the handle of the digital context and RBX the offset of the digital register – the inner register is saved in [RBX + RBP].

The whole operate is summarized in Determine 11.

Determine 10. Busy-waiting for interpreter in vm_init()

The bytecode’s handle, provided within the parameters, is added to the digital context together with the handle of the digital instruction desk, which is hardcoded. Each have a devoted digital register.

The VM calculates the bottom handle once more in the identical manner as was described for vm_entry; as well as, it shops the handle in one other inner register that’s used later, ought to an API be referred to as. Then the bottom handle is used to relocate the instruction desk, its entries, and the bytecode’s handle.

The calculated base handle is just added to all of the operate addresses in the event that they haven’t already been relocated.

Determine 11. vm_init() abstract

Digital directions of the second digital machine

We begin by wanting on the first few executed digital directions to watch the habits of the second VM after which attempt to course of the remainder of them in {a partially} automated manner.

The diagram in Determine 12 highlights in blue the place the digital directions of the second VM are within the construction of the VMs.

Determine 12. Digital directions within the construction of the digital machines

The primary digital instruction

The primary digital instruction is, exceptionally, not obfuscated, as might be seen in Determine 13. Lastly, we are able to see some operations within the digital context.

By inspecting the modified reminiscence and calculated vacation spot handle of the instruction, it’s clear that the instruction does three issues:

  • Zeroes out a digital 32-bit register at offset 0xB5 within the digital context (highlighted in grey in Determine 13), which is saved within the RBP register
  • A digital 64-bit register at offset 0x28 is elevated by 0x04: it’s the pointer to the bytecode – digital program counter. The dimensions of the digital instruction is therefore 4 bytes (highlighted in purple in Determine 13).
  • The subsequent digital instruction is ready to be executed, the offset to the digital instruction desk – digital opcode – is fetched from the digital program counter. The digital instruction desk is at offset 0xA4 (highlighted in inexperienced in Determine 13). Because of this the VM makes use of the Direct Threading Dispatch approach.

Determine 13. The preliminary digital instruction of the second VM

Observe that the scale of the subsequent instruction’s opcode is simply two bytes and the remaining phrase is left unused. We are able to see that it’s only a zero once we have a look at digital operands (Determine 14). Sizes of the opposite directions differ – it’s not simply padding that preserves the identical measurement for all directions.

Determine 14. Bytecode of the digital instruction

The second digital instruction

The second digital instruction doesn’t do something particular; it simply zeroes out a number of digital registers and jumps to the subsequent instruction (Determine 15).

Determine 15. Vacation spot handle and reminiscence modified by the second digital instruction

The third digital instruction

The third digital instruction shops the handle of the stack pointer in a digital register (Determine 16); the offset of the register is decided by one of many operands, and its offset is 0x0141 in our case.

Determine 16. Vacation spot handle and reminiscence modified by the third digital instruction

The fourth digital instruction

The fourth instruction incorporates two instantly seen anomalies compared with earlier directions – the stack pointer’s delta is decrease on the finish of the operate and it incorporates a conditional department (Determine 17).

Determine 17. The conditional department and delta of the stack pointer of the fourth digital instruction

Symbolic execution of the primary block reveals {that a} worth is popped from the stack right into a digital register (Determine 18), which is sensible because the values of the native registers stay on the stack after being saved there by vm2_init(). They’re now being moved to the digital context – the context change is partially carried out by numerous digital directions, every of which pops one worth off the stack into a unique register.

Determine 18. Vacation spot handle and reminiscence modified by the fourth digital instruction

The digital register, the place the worth of the native register is to be saved, is decided by an operand and two different digital registers at offsets 0x0B and 0x70. Nonetheless, their preliminary worth is already recognized: they have been set to zero by the second digital instruction (Determine 15), which signifies that we are able to calculate the offset of the register and simplify the expressions – they’re used simply to obfuscate the code.

Rolling decryption

Evaluation of different digital directions confirmed that the digital registers at offsets 0x0B and 0x70 are meant simply to encode operands. This system known as rolling decryption and it’s recognized for use by the VMProtect obfuscator. Nonetheless, it’s the solely overlap with that obfuscator and we’re extremely assured that this VM is totally different.

The obfuscation approach is actually one of many causes for the big variety of digital directions – use of the approach requires duplication of particular person directions since every makes use of a unique key to decode the operands.


The expressions might be simplified to the next once we apply the recognized values of the digital registers:

IRDst = (-@16[@64[RBP_init + 0x28] + 0x4] ^ 0x3038 == @16[@64[RBP_init + 0x28] + 0x6])?(0x7FEC91ABD1C,0x7FEC91ABCF6)

@64[RBP_init + {-@16[@64[RBP_init + 0x28] + 0x4] ^ 0x3038, 0, 16, 0x0, 16, 64}] = @64[RSP_init]

Now allow us to check out the expression within the conditional block:

@64[RBP_init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] = @64[RBP_init + {@16[@64[RBP_init + 0x28] + 0x6], 0, 16, 0x0, 16, 64}] + 0x8

We are able to now see that the digital instruction is certainly POP – it strikes a worth off the highest of the stack to a digital register, whose offset continues to be obfuscated with a easy XOR; it moreover will increase the stack pointer when the vacation spot register is just not the stack pointer.

As values within the bytecode are recognized too, we are able to apply them and simplify the instruction even additional into the next remaining unconditional expressions:

IRDst = @64[@64[RBP_init + 0xA4] + 0x5A8]
@64[RBP_init + 0x28] = @64[RBP_init + 0x28] + 0x8
@64[RBP_init + 0x141] = @64[RBP_init + 0x141] + 0x8
@64[RBP_init + 0x12A] = @64[RSP_init]

Automating evaluation of the digital directions

As doing this for greater than 1000 directions can be very time consuming, we wrote a Python script with Miasm that collects this data for us so we are able to get a greater overview of what’s going on. We’re significantly fascinated with modified reminiscence and vacation spot addresses.

Simply as within the fourth digital instruction, we are going to deal with sure digital registers as concrete values to retrieve clear expressions. These registers are devoted to the rolling decryption and carry out reminiscence accesses which can be relative to the bytecode pointer, e.g., [<obf_reg_1>] = [<bytecode_ptr> + 0x05] ^ 0xABCD.

Subsequently we concretize the pointer to the digital instruction desk too and, by the top of the digital instruction: calculate addresses of the subsequent ones, clear the symbolic state, and begin with the next digital directions.

We moreover save apart reminiscence assignments that aren’t associated to the inner registers of the VM and steadily construct a graph primarily based on the digital program counter (Determine 19).

Determine 19. Name graph generated from reminiscence assignments and the VPC

We cease once we can’t unambiguously decide the subsequent digital directions to be executed; one can mechanically course of many of the digital directions on this manner.

Observe that directions that includes advanced loops can’t be processed with certainty and should be addressed individually as a result of path explosion drawback of symbolic execution, which is described for instance within the paper Demand-Driven Compositional Symbolic Execution: “Systematically executing symbolically all possible program paths doesn’t scale to giant packages. Certainly, the variety of possible paths might be exponential in this system measurement, and even infinite in presence of loops with unbounded variety of iterations.”

For different actions associated to digital directions and digital machine initialization, please seek the advice of the ESET Analysis white paper Below the hood of Wslink’s multilayered digital machine.


Now we have described internals of a sophisticated multilayered digital machine featured in Wslink and efficiently designed and applied a semiautomatic resolution able to considerably facilitating evaluation of this system’s code.

This digital machine launched a number of different obfuscation methods akin to junk code, encoding of digital operands, duplication of digital opcodes, opaque predicates, merging of digital directions, and a nested digital machine to additional impede reverse engineering of the code that it protects, but we efficiently overcame all of them.

To cope with the obfuscation, we modified a recognized approach that extracts the semantics of the digital opcodes utilizing symbolic execution with simplifying guidelines. Moreover, we made concrete the inner digital registers for obfuscation together with reminiscence accesses relative to the digital program counter to mechanically apply recognized values and de-obfuscate semantics of the digital directions – this moreover broke down boundaries between particular person digital directions.

Boundaries are crucial to stop path explosion of the symbolic execution; we’d lose monitor of the digital program counter – our place within the interpreted code – with out them.

We outlined new boundaries by symbolizing the handle of the digital instruction desk, since it’s required to get the subsequent instruction, and concretized it solely once we wanted to maneuver to the next digital directions. We subsequently constructed a management circulation graph of the unique code in an intermediate illustration from one of many bytecode blocks primarily based on the digital program counter, and extracted deobfuscated semantics of particular person digital directions. We lastly prolonged the method to course of each digital machines directly by fully concretizing the nested one. Once more: for full particulars, see our white paper.

Source link

Leave a Reply