Thank you.
Good afternoon, everyone.
I hope that you won't fall asleep, so I will try to make my presentation uneven, like,
you know, jumping from here and there, so you don't, you will have to stay vigilant
all the time.
So my presentation is about creating, is it?
It's about creating plugins for IDA Pro.
That's the outline.
We'll talk about why we need plugins at all.
Second, we'll talk about the API.
If you already wrote some plugins, you know that they are good stuff and really ugly things.
And then we will look at some simple, simple plugins.
I hope to have some free time at the end to answer your questions if you have.
There's an online copy of this presentation, so you can download, if you have your laptop
and you cannot read something, you can go and see it online.
So IDA is an interactive thing, and it's a programmable disassembler.
There are three ways of automating things in IDA, that you have macros, the keyboard
macros.
Unfortunately, they are available only in the text version of IDA.
They are also the scripts and the script language called IDC, and they are plugins.
We'll talk about the last one, because the other things, the keyboard macros, there's
nothing to tell about them.
They exist, but unfortunately only in the text mode.
And about the IDC language, I can say, as you can, I think that many of you, many bad
things about the language, because it's just a...
In fact, I wanted to write a small parser.
I wanted to exercise, to make an exercise.
So I wrote a small language interpreter.
It's not a good thing, and you know that it lacks many modern features like array structures
and the hashes.
All the stuff is not there.
And it's, again, it's yet another language to learn.
And I know that many of you use IDA, Python, and others.
I think that now we have Python, Ruby, and many other scripting languages available in
IDA.
I'm sorry, I sometimes I say IDA, sometimes IDA.
But unfortunately, we cannot just dump this language, because there are some useful IDC
scripts, and we have to continue to support it.
Maybe in the future we will have some provisions for seamless integrating of other scripting
languages like Python or something like that.
So it will be maybe available out of the box, but I cannot make any promises yet.
And one more thing, and my slide doesn't say it, it's a really slow, really limited language.
And you cannot do everything in IDC, you can do in IDA.
So if you are serious into using IDA and reverse engineering many things, you will have to
use the real API, the plugin API.
And I know that many of you say, oh, it's really difficult stuff, you know, to write
a plugin you have to learn many things.
Yeah, this is correct.
You have to know how to program in C or C++.
But on the other hand, it's not that difficult.
In this API, you can have access to all subsystems in IDA.
You can write your processor models, file loaders, you can improve the analysis, add
user interface stuff and other things.
So many, many things you can do.
In my presentation, I will show you how to write these small plugins.
It's not that difficult.
The only problem being the API itself because it's a collection of different coding styles,
different naming conventions, and maybe it reflects my coding preferences over time.
I started to write IDA in 1990 and now it makes 18 years.
So of course, I evolved, I hope at least, that my coding became better over time.
So there are some very old stuff that maybe doesn't make sense today or something that
it's difficult to understand.
But overall, we try to improve things and to add new more intuitively understandable
things in the SDK and improve it over time.
Second, we have the SDK.
And the SDK comes with, let's say, many, many modules, processor modules, plugins, file
loaders.
And overall, you have more than 170,000 lines.
So you have plenty of source code to copy from and to learn.
Of course, the API is very big.
It's more than 1,000 functions, so it takes some time to learn.
I think it's the same with IDA itself.
You take an IDA, you give it to a new guy who hasn't seen it before, and at first, it's
really intimidating.
Like, oh, you have so many things.
I don't know where to go, what button to press.
Unfortunately, it's difficult to give an answer saying, you know, in IDA, you run this, you
do this, and after you get result.
You cannot say it like this because the result depends on what you want to achieve with IDA.
Maybe you want to find something.
Maybe you want to reverse engineer the whole application.
Maybe you are looking for vulnerabilities.
There are many, many things you can do with IDA.
And all this depends on your goals, on your methods, on your target processors.
So, unfortunately, the learning curve is very different.
They're very steep at the beginning with IDA, and it's the same with API.
Normal programs, they are written in a very simple, in the following way.
First, there's a design phase.
We make a nice design, maybe with UML diagrams, all that stuff.
We understand the logic.
Then we have clear understanding of what we need to do, and only then we start to implement
things.
And after we debug it, then we have something, an application.
Unfortunately, it's not the case.
Oh, fortunately, I don't know.
It's not the case with IDA.
At the beginning, it started like a disassembler, a very simple disassembler.
And if you ask what disassembler is, it's a very simple thing.
It's just that you take an opcode and you convert it into a mnemonics like EB, it means
jump.
By the way, I don't remember all those opcodes.
The previous presentation was showing some opcodes and asking us what this opcode means.
I am very bad with this.
That's one of the reasons why I created IDA, because I could not understand all this stuff.
So IDA started as a very simple disassembler, and it was evolving naturally.
Fortunately, the base stuff, like the database and the way that the information is presented
in the database, that's good enough.
So we kept the same architecture all the time.
So the architecture of IDA is the same as from the beginning.
It hasn't changed.
But we were forced to add new things to IDA, to IPI, and that's why it's a collection of
mixed stuff.
So I tried to come up with a list of some unforeseen things we had to add to IDA.
Like you know that a byte is always eight bits.
Unfortunately, it's not always the case.
There are some processors where addressable quantities are not eight bits.
Well, the next thing I did not foresee, and it's my fault that the graphical interface
would be something really nice and a way to go.
That's why IDA stayed with the text interface for long years.
Then there were bytecode machines.
There were 128-bit computers.
A recent addition to IDA is this multiple chunk function.
They were not existing before.
If you asked me in 1995, I would say, you know, it never happens.
The function, it starts there, and there it's one chunk.
And unfortunately, it's not okay.
I think that all assumptions you can make about the code, they are all violated, especially
today with all this obfuscation stuff.
Now we can say that if you have a move, the next byte must be an instruction.
IDA makes this assumption, so when it disassembles things, it goes like this, move, move, add
something, so it's a linear code execution.
But unfortunately, it's not the case with obfuscated code because there might be an
exception thrown, and you never return.
So again, something violated.
And the adversary is a human being now.
So it's not like IDA had a shot coming, and we just add something to IDA, and it will
solve the problem.
It won't, because as soon as we add something like a rule or something, make IDA more complex
to understand this particular obfuscation pattern, the adversary will come up with a
new thing, and they will say, okay, now let's say, I will give an example, an old example,
but still, we have a call instruction.
When we have a call instruction, and if the call returns, it usually returns to the next
byte, the following byte.
So IDA thinks, okay, we have a call, the next thing must be an instruction.
And it disassembles, happily disassembles, continues to disassemble all this stuff.
And what if the called function is a special function that does not return to the next
byte, but it always returns, for example, let's say, four bytes further?
Then all this disassembly is wrong.
You have these junk bytes inserted, IDA tries to analyze them, and you are out of sync,
and nothing works.
So you can say, and you will be right, that IDA cannot handle it.
Yes, IDA cannot handle these things, but on the other hand, IDA can be fine-tuned or
configured to handle this stuff, and it can handle many things like this.
I'm kind of diverting from the main thing.
I will return to this stuff, and then we will talk about it later.
Anyway, another thing is the debugger.
I would never say that the debugger would, we would add the debugger to IDA.
It's something really, it was a hack to add it, the graph view and other stuff.
So anyway, the thing is that, yes, API is a collection of mixed things, but it happened
like this because it was a naturally evolving platform.
If someone comes up with a new disaster server or a reverse engineering platform that can
handle things better, why not?
But unfortunately, that's what we have today.
And things that we can say for the future, things that many, we got some requests in
this direction.
I'm not saying that we will do all these things, but just something that we would not even
think about having multiple processors, input files, multiple users per database.
Strangely enough, this request of having multiple users per database, so you kind of, you have
a server and you have multiple users connecting to the server and working on the same application,
reverse engineering, it happens quite often.
But if you think about it, it doesn't make much sense.
Do you have a multi-user version of MS Word or Notepad?
Like you have a file on the server and multiple users edit the same file at the same time.
You don't because it's just, it's impossible to handle it because there are too many things
happening at the same time and it would be just a mess.
Multiple debugging sessions per debugger server, by the way, it will be implemented in the
next version.
Multiple analysis stress, all that stuff.
I don't say that we will implement all this stuff, but just trying to show you that things
evolve in the directions we cannot foresee.
Or we can foresee, but we cannot implement it at once because it would take too much
time.
Now about the good stuff.
Ida itself is a modular thing, so we have a kernel and we have these different models,
the loader models, processor models, plugins.
So you can improve things with Ida.
This is the best point of Ida, that you can add things to it and you can improve things
to it.
I know that all of you using Ida, I'm sure that you got frustrated on Ida saying this
stupid program doesn't understand this or doesn't do this the correct way.
It's perfectly normal.
You are frustrated because you are trying to solve a problem and the tool, instead of
being helpful for you, it kind of hinders from doing the right thing.
Unfortunately, we cannot handle all the cases, so you will have to do things yourself, program
a bit to make Ida really useful for you.
I will just tell you a little bit more about the architecture and how the information is
stored on the database so you will understand the next slides much easier.
The database, there are four files.
The B3, this is the most interesting file because it contains the most information like
names, commands, and other stuff.
Type information temporarily is kept there also.
Also the next file is called the FLAGS file.
For each byte of the program, Ida allocates a 32-bit value which describes each byte,
if it's code or data, if there's a name attached to it, if it has a comment, how the operands
are presented, and all this stuff is encoded in this FLAGS byte.
The next file is called name pointers.
For the moment, we just ignore it because it's technical detail.
The last file is called the type library, and the type library has the information about
the function prototypes and other stuff.
These four files, they comprise the database and Ida stores all the information in these
files.
If you want to write a plugin, we will do it this way.
Each plugin has a descriptor, and the descriptor has pointers to three functions.
The first function is called initialize.
The name implies it initializes the plugin.
Then we have the termination function and the function to invoke the plugin that does
the real stuff to run the plugin.
We also specify the name of the plugin and maybe the hotkey that is attached to the plugin.
This information, this description block never changes, you can say.
Maybe sometimes you change the FLAGS, but this is something you may ignore for the moment.
About the first function.
When you have a plugin, your plugin will do something for you, for your particular situation.
If you work with obfuscated Windows applications, you are not interested in Spark module or
something like that.
The first question your plugin should check is the first parameter is if the current processor
is supported by the plugin.
If the next question is about the file format, maybe you are interested in the text mode
and other questions.
All this is optional, so we can say that in a very abstract way, the initialization function
should check if the plugin is compatible with the current database.
If it's not, it should say no, I'm not going to work with the database.
Otherwise you say yes, I want this database, you return the OK code.
The next thing, if your plugin is useful, if it's an interactive plugin, then there
must be a way to invoke it, to start it.
Either you will see your plugin in the plugin submenu, or your plugin may also add any menu
item to any menu in IDA.
For that you use this add menu item function code.
Sometimes it's better to create your plugin so it's fully automatic.
It doesn't require any user interaction.
For that you can hook to different events, to different notification points.
You can intercept.
There are many various events.
I just gave you some names here, but I think that it's better to go to check the header
files and the SDK to have a better idea how it works.
But anyway, the thing is that you can hook to event and create your callbacks, and ID
will call you a plugin when something happens.
All this stuff is very long.
To keep it short.
Here is the source code of a very simple plugin that prints only Hello World on the screen.
You see that it's very simple.
At the end we have the description.
It's always the same thing.
The function, the run function will be called when the user selects the plugin from the
menu, and the initialization function returns plugin OK, saying that it will work with any
IDB.
If you were afraid to create a plugin for IDA, you see that it's not that difficult.
Maybe some difficulties, you will have some difficulties with setting up the environment,
compilers, and stuff, but again, this is something you do once and after you are set up and everything
works out.
The first plugin was not that useful.
It was just printing Hello World on the screen.
Let's say the next one.
Not that useful either, but I'm just showing you something else.
This plugin, you know that when you press Alt X, IDA asks first if you want to save
the database or not.
Imagine you don't want this question asked.
This plugin handles the job.
It will just, what it will do, as soon as it starts, it will switch either to the batch
mode.
You see that it says batch equal to one.
And after, it will close the window.
So this is how you close the IDA without asking any questions.
In fact, this plugin, again, is just a simple illustration.
It's not useful because if you didn't know, you can keep the shift click.
If you shift click on the window call button, IDA won't ask any questions.
Now let's turn on to more interesting things.
Imagine you want to find something, but not in one database.
You have multiple databases, let's say 10 or even 100, and you want to make a search
in all these databases.
Doing it manually, it's tedious and it's really cumbersome, so it's better to write a plugin.
We can, one way is to write a plugin, and the question is, let's try to find a function,
let's say, in many databases.
Since function can be compiled and linked in at different addresses, we cannot use simple
binary search.
So we'll have to create a signature file, first of all, and then we will run IDA with
a plugin that will load the signature file, try to find the function.
If it finds, it will log the result or quit or switch to interactive mode, anything you
want.
Otherwise, it will silently quit, and this way you will be able to call all this stuff
from a batch file and run for all databases.
I'm sorry?
Ah, these are the names of the utilities that can be used to create signature files, and
they are available from, they come with IDA.
They come with IDA, maybe you haven't heard about them.
It's called, they are called Flir Utilities.
And this is our plugin.
I skipped the description part, the include file, so I just gave you the meat of the plugin,
only the thing that, the code that does the real thing.
Please note that we don't have any, the run function here, because we will do everything
in the init function.
Initialization time.
We will do everything, and if we find something, then we will stop, we will display a message
on the screen.
Otherwise, we will make IDA stop.
You see that what we do here?
First of all, we check that if there are any options for us, if there are some options,
it means that either it was called in a special mode for us, we don't want either to do all
this stuff when in normal, when you want to analyze a file in an interactive mode, this
plugin should not intervene.
So that's why we check the plugin options first.
If there are some options in the command line for us, then we apply a signature file to
the database.
Then we wait for the analysis to finish, and after that we check if there were some matches.
If there are some matches, we print the information saying found that number of matches.
Otherwise, we just quit.
We don't do anything.
So what happens is that IDA has even no chance of displaying any windows on the screen.
It will just disappear from the screen.
And this is a good thing, because this can be used from a batch file.
You can run this from a batch file and run the other things.
And this is the file, this is the command line you have to use to run your plugin.
You see that this minds all uppercase, the name of the plugin, SIG, colon, and the parameter.
This way we run IDA for all databases we have, and we're asked to do something with all
database.
This pattern can be used not only to search for something, but for many other things.
You can look for a specific command, you can look for check sums, you can write a vulnerability
scanning on this base.
Your plugin does search for something.
If it finds something, it prints or logs the results.
Otherwise, it just continues.
You know that IDA used many rules during the analysis.
Currently, the built-in rules, they are generic, and sometimes you can come up with better
rules and you can say, okay, why I didn't do this?
Because we could not add this rule in the list of generic rules, because it would ruin
something, some disassembly in some cases.
So in these cases, you have to do things yourself and you have to create a plugin.
There are the following approaches, four different things.
First thing, you do it manually.
You write, as you do it really manually, like pressing the hotkeys, as you write a script,
as you write a plugin, and you run the plugin manually.
This is how most plugins work, but this is manual work because it means that it's slow
and sometimes you forget to run your plugin on some of it.
So it's better to use the next three approaches.
When you make your plugin automatic, it's always based off because then it will just
stay in the background waiting for something and doing things for you automatically.
One way is to wait for the file to load and at that time you scan the database and find
some interesting patterns and change something in the database.
Second approach would be to wait for the analysis to finish and only then scan.
There are situations when the first approach makes sense and the situations when the second
approach makes sense.
But the best approach is to hook the events and to improve things on the fly.
This is the best thing you can do because if you do it at the beginning, maybe it's
too early.
If you do it at the end of the analysis, maybe it's too late because you already missed many
things and let either go the wrong direction.
Let me show you an example.
The iPhone binaries, this is something relatively new stuff, they use this instruction.
If you don't know this instruction, it doesn't matter very much because it's just an instruction
that is used as the first instruction of many, many functions.
Ida doesn't know about this thing.
So and because of that, it misses many functions in iPhone binaries.
Let's write a plugin that will address this shortcoming.
What it will do?
It will track for this subcode in the ARM binaries and as soon as it finds them, it
will mark them as the beginning of function.
This plugin will be full automatic so it can stay in the background and never bother you
and just makes the analysis better for you, the listing is better.
This is the plugin.
It's very simple.
What it does, there's a loop.
You see that the for loop, it goes, it searches for the pattern.
As soon as it finds it, it says auto make procedure.
You see that?
Create a procedure at that address.
It's a very simple plugin and the same pattern can be used for any byte sequences, not only
for this particular instruction, but you can improve, you can add many other byte sequences
as well.
And you see that the plugin works at the beginning, at the initialization time, and after it doesn't
do anything.
And it even unloads itself from the memory because it's not used anymore.
So you see that this slide is not that useful because it just throws in a function that
starts with this UXTB instruction.
That was the first approach.
When we do something at the beginning, the second approach to do something at the end,
it does, you do like this, you wait for the analysis empty to be, analysis queue to be
empty.
At that time, you know that the analysis has finished and you do something like adding
improvements and changing something in the database.
This is a pattern.
I won't stop at this in detail.
Later you can look at the slides on the website or study them at your leisure.
And now the most powerful way of doing things, we do things on the fly.
Say we hook on the emulation event, we try to recognize something and if you recognize
something, we improve the listing at the current address.
This is a very high level, abstract level of representing things.
We look at the current instruction.
If it's our case, we do something with the listing.
To be more concrete and to give you an example, let's try to create a, let's create a plugin
that will find the return instructions for the ARM processor.
The problem is that the ARM processor has many different encodings for the same instruction.
They are even called differently.
It can be simple return.
It can be BXLR, which means again return, because the ARM processor does not store the
return address on the stack.
It works slightly differently.
You can pop the PC, the program counter from the stack.
Sometimes they do still, I'm sorry, sometimes they still save the program counter on the
stack and so on and so on.
Sometimes the return instruction is even encoded as two different, two separate instructions.
So our plugin will detect this.
So our first function is to recognize the pattern.
See that here you have to check, verify that we have the required instruction.
We check for the BX.
We check for, here we check only for BX, I see, for this BX and the pop sequence.
And if you find the pattern, then we improve the listing.
Improving the listing, there are many different ways of doing it.
You can rename things, add comments, patch database, change the operand type.
Depending on the situation, you choose whatever you want.
In our case, we will just add a small comment saying that it's a return instruction.
So what we do, we check that the flags specify that there are no comments attached to the
current address.
And if this is not the case, we will add the return comment.
Well, since our plugin was not very sophisticated, I tried to keep it simple.
It just adds small comment like this at the end.
So it's really sophisticated.
In fact, there are many events you can hook to, and they can be used to improve many things
in IDA.
The first, the main event to hook to is the emulation event.
Our previous plugin used that.
That event can be used to recognize different patterns and to do something with the database.
By the way, the next version will also add events for cross-references.
Your plugin could check if the instruction is a sane instruction that makes sense.
Otherwise, your plugin can say, do not create this instruction.
Or it can do many other things like create data items.
There are many things to improve things.
Anyway.
To perform the final pass, change the byte value and other things.
You see that there are many events in IDA, but still they are not enough to do everything,
to replicate everything that IDA does to the database.
Because there are some events, something that happens, and you cannot do anything with it,
and it will just go unnoticed.
But anyway.
We can continue with this as well, because it's not that important.
One more plugin sample that might be more interesting for you, but something that is
closer to you.
The plugin will hook to the rename event, and if the new name starts with this prefix,
it will automatically convert it to Unicode.
It makes sense since names starting with this prefix usually mean that we have a Unicode
string at that address.
Again, the plugin is very, very simple.
What we do is just, there's some scaffolding, like you have to retrieve the arguments from
the stack and all that stuff.
Otherwise, the most important line is the if.
If we compare the name and its SQL, then we create a string, make a key string at that
address, and we specify the type.
It's a Unicode string.
Unfortunately, I don't have the slides here with me showing how to handle the obfuscated
code and other stuff, but again, it's doable because you can intercept instruction creation,
and as soon as you see that the created instruction do not make sense or recognize a pattern,
you change the analysis how it goes.
You destroy the wrong instructions.
You may even patch them out.
You remove them, and you continue the right way.
Yes, and this slide shows the rest of the plugin, but it's nothing special.
At the initialization time, we register our callback function, and at the termination
time, we remove the callback function.
So that's the two things we have, and this slide shows how our plugin works.
You see that's pretty simple, but unfortunately, you have to write things in CC++.
That's why maybe we will still kind of... I don't know if it will happen soon or not,
but having all this stuff in Python or Ruby or in Perl, maybe it will be much more interesting
for you because I know that most of you use these languages today.
CC++ are... I wouldn't say they're dying languages, but still, they are very old, and
things are done in a more elegant way when you have scripting language like Python and
other stuff.
I don't know if I'm late or not.
No, that's okay.
Thank you for your attention.
If you have any questions, please feel free to ask.
Yeah.
Thank you.
Yeah?
Yes, I would say I would return to the same thing that showed you...
Let me show you the... find the slide.
This slide.
This is the pattern I would use for obfuscated code.
You recognize the pattern, the obfuscation pattern, and you do something with it.
It's a very generic way of answering the question, but I think that it's the best way to do it
because you handle things as they come.
It's not like you scan the database and you do things, but because the application may
keep its parts in the encrypted form, and if you just scan them, you scan only the decrypted
bytes, and with this pattern, you will handle all of them on the fly as they happen.
Of course, you will have to elaborate how you recognize the patterns.
This is something that depends on the obfuscation method.
I know that it's not the answer maybe you would expect because, again, you need something
practical and working.
Any other questions?
Yeah?
The cool stuff for... Ah, for the next session.
Yes, thank you for your question.
It's a very good question.
It allows me to answer all these questions and say, oh, we have many stuff.
The next thing, I think that it will be released pretty soon.
What we will do first is that there will be better debugger support.
There will be new debugger modules.
iPhone.
We have a debugger for iPhone, Symbian, in addition to the things that we had before,
like Linux, Mac, Windows applications.
Second thing, that multi-threaded applications will be handled much better in the debugger.
Before, there were problems with them because Aida was supposing that it's a kind of single-threaded
thing, and as soon as you single-step, all threads were frozen.
They were suspended by Aida.
This was a problem with multi-thread applications.
There will be more controls like suspended thread, resumed thread, check the thread step
and stuff.
The debugger, the server itself will be multi-threaded as well, so you will be able to connect to
it from many different Aida copies.
There will be a new PDB plugin that will handle almost everything from the PDB files.
You will get names, types, all that stuff.
The listing is much, much better.
I can tell you that, especially for the decompiler, because the decompiler uses this type of information
very, very heavily, and it really helps.
Speaking of the types, we will release the utility to create type libraries yourself.
Before it was not available, and so you will have these type libraries.
You will be able to create them and change them and all that stuff.
There will be more events.
I showed you that there will be events to create cross-references and other stuff.
By the way, we will also publish the source code of all the debuggers.
I know that our debuggers, there will be some, maybe some, not problems, but again, some
anti-debugging tricks and other stuff, and we cannot cover all of them.
You will have the source code, and you can play with it.
You can recompile your debugger servers the way you want.
There will be then new signature files and all that stuff.
I hope that there will be less frustration using the next version, because we fixed all
bugs you reported to us.
In general, we try to fix everything you report.
You have a bug.
You send it to us, and we fix it.
We do this as soon as we can.
Usually it takes one, two days.
In some cases it's more, but in general, do not be afraid to send us a bug report.
As soon as we can reproduce it, it will be fixed, and you will get a fixed update immediately,
your personal copy in your mailbox.
That's what I remember like this.
Maybe there will be some provisions to make it easier to integrate with this Python and
other scripting languages, but this is something to consider.
It's not implemented yet.
One more thing I just remember now.
You know that the SDK, this API, plugins, all that stuff, we say, no, you are at your
own.
You implement something.
We don't provide any support, because it's really difficult.
It's a tough thing to do, because there might be a subtle bug in your code, and you say,
it doesn't work.
I don't know why it crashed and so on.
Yes, we still officially do not support, but again, if you have a problem with your plugin
and you can kind of isolate it, make it simple, and you can share this code with us, no problems.
Send it to us.
We'll try to help you.
If you have a problem with plugin development and you have a problem, tell us, and we'll
try to come up with a solution.
No more questions?
Okay, great.
Thank you for your attention.
Great work here today.