Our first talk today is by Jason Raber. It's called HealCon Linux Debugger. And I hope you enjoy it. So anyways, my name is Jason Raber and I'm a team lead for a reverse engineering team that we have in Dayton. And we do red teaming, which is evaluating, you know, we've got some commercial companies that do protections. We have embedded customers as well that are now getting into the protecting their firmware and hardware protections. And so we evaluate that and we'll tell them how to strengthen it. And so some of the work that actually we got even next month is with a customer that's got Linux type protection. And so we developed a debugger to help us analyze and get around those types of protections and also evaluate malware and stuff. We find that while Linux is not immune to the malware and viruses, I think what happens is with malicious software, majority of them tend to focus on Windows, but Linux is an operating system that's fairly popular and it does get malicious intent. And so what happens is what they want to do is protect their IP. In other words, just by the casual inspection, you can't understand what they're doing. So what they'll do is like packing and that type of stuff. But even after you get through the packing and start actually debugging it, they'll start catching you on some anti-debugging type tricks to cause you to maybe analyze code that you really are not interested in, like Honeycode. And so like cost debuggers like GDB or IDA Pro that works for Linux, you know, like I said, these type of, I mean basically the way like Ring 3 type debuggers like GDB work, these types of malwares will embed code in there to utilize these types of debugging measures that like GDB uses. And so they can check to see if a debugger is being used and if so, like I said, crash out. So while it's interesting to, you know, kind of break those types of protections and understand really what it's doing, you know, we generally want to do it pretty fast. So in doing that, you know, I don't have to see every single one of them. Let me back up just a second here. So basically I want to talk to you about what anti-debugging is and then how it can and how it kind of works to show you a little bit what the output would be. And then I'll go down into the nuts and bolts of really how I developed it and what, how it really works and this and that and some future work that I want to do. Okay. Okay, so like I was saying before with GDB and IdaPro or, you know, these, they register with the OS. So essentially like if you add an int 3 and like say you want to debug something which is, you know, opcode is CC and that causes a trap that will allow you to analyze a particular address and maybe some registers and that type of thing or you can use hardware registers as well to look at those areas of memory. Well again, those things are easily exploited and so to keep you out of their IP or understanding what they're doing, they'll cause it to crash or, you know, and some of the better stuff actually doesn't crash. It will keep running and just run somewhere else. And then also in Linux, they use Ptrace which is a process trace which is GDB uses. Then you can put that in your, bury that in your code in a number of places to catch if a debugger is being used. And so, you know, basically Helicon is a driver that I developed and so it gives me certain access rights being done at that level. And what I've done with the Helicon is actually rootkit my own system which is okay since I own the system and I'm not trying to, it's not like an anti-rootkit tool or anything like that or detecting rootkits but I'm actually rootkitting myself so that I can exploit certain things in the operating system to allow me to stealthily look at memory without being detected. Okay, so just kind of give you some ideas of some of the stuff that we've looked at and, you know, on the malware side too. They, you know, these are just, like I said, just a small sample of what's really going on as far as exploits that they use to trip you up if you're using, like I said, GDB or, I mean, even a hardware emulator or a kernel debugger is going to use, you know, debug registers which again can be caught. And if you set an address there, you know, a common technique is to actually, you know, push all the registers and then check to see if any of the DR registers are being used and if so, then it will sometimes just clear them out and just keep going so that your breakpoint never hits. And you can see, like I said, there's timing checks and parent-child type debugging and looking in the process self-stat directory. And obviously we've tried this on a lot of these different samples and our debugger is not getting caught. Okay, so really what is it doing? What am I trying to do with the drive? I mean, in the simplest sense, I want to analyze some area in code. So what I do from kernel land, I have mechanism that lets me know that the software that I want to analyze has been loaded in memory. And so what I do is I reroute that code with a jump. It jumps to some sort of slack space and then I can analyze, you know, push the registers and it allows me to manipulate the registers. Maybe I want to change a register or change memory. It allows me to do that and I have a mechanism or a protocol that I let my driver know that I've hit my breakpoint and I'm interested in doing something interesting or analyzing some area of code. And again, all these, from the driver's perspective, all the injections are dynamic and I replace them as soon as I'm done analyzing that code. In other words, like I said, I'm rerouting the actual instruction. So I replace it back as soon as I'm done analyzing it. And so no types of checksums in that recorder is getting caught. Let's see here. Okay. So, kind of give you an idea of just some real simple anti-debugging type stuff that can be done in Linux. You can see here I've got this function called foo that doesn't do anything. It just prints out foo but the idea is that maybe this function, I don't know, checks for some sort of key or gets a key or, you know, some sort of a password or something. And you know, the reverse engineer wants to analyze that area of code. He would put a breakpoint there using GBB and to analyze it. Well, and in Windows 2 they do this similar as that they do these handlers that they set up and you can see following down I've got this handler that looks at a global variable and sets it to zero if that exception gets caught within the process itself. So in other words if a debugger is running and there's an exception it gets thrown to the debugger and so it doesn't get passed on to the program. And in those particular cases you would see that the trace would not get set to zero. And then I've got another function called test trap which again I'm causing some sort of a trap or an interrupt. Again which gets passed to the debugger. And then looking down in the main you can see that we register our handler right here and then we actually cause the interrupt. Okay. And then obviously if there's no debugger being that's capturing that exception it's going to be set to zero because our own exception handler is going to catch it and it's going to continue on and not print out that it's been detected. Like I said you're not going to see this in malware where it's going to say debugger detected. Although I have seen believe it or not some malware actually doing this very thing is actually spitting out that there's been some sort of debugger running and it just crashes out which is great for us because then we're able to trigger really fast on where that error was. And in another case where like I said you're looking for CCs so we're dereferencing some memory of where that function is and looking at the opcode to see if there's a CC there which again if you apply the software breakpoint there. And then of course the Ptrace if it returns anything other than zero it's going to be detected. And so using DDD which is a wrapper for GDB you can see here I put a, I got up the function foo and push EBP which is a 55, put a little breakpoint there until when I actually run it here you could see that I've got some sort of signal that happened right or a trap and it even tells me the line which is kind of nice because then you can kind of go find that protection. But maybe there's a hundred of these things scattered all over the place and that's kind of like really annoying to try to continue to keep finding each one and nopping them or getting around them or scripting them out. So that's again why a custom debugger that doesn't fall under these same rules as other debuggers allows you to like say quickly analyze code without being bothered with a lot of different protections. Anyways when I hit continue you can see now the printouts or the debuggers are detected for the signal, the CC and the Ptrace and actually hit my breakpoint too. Okay again this is the disassembly of what the program looks like and so what I'm going to do is in the Helican driver I'm going to go ahead and add a breakpoint to this address 80481D8 so kind of try to remember that address. It will keep coming up. Okay so just to kind of show you how you would install the driver if you're not Linux heads it's you know you use nsmod load the driver then run the program and then remove the driver. And you see here that in our example of the printouts of debugger detected we're not seeing that with the Helican. And if you go down a little bit further down in this area you can actually see where from the kernel driver I'm actually spitting out the address okay so the breakpoint's been added in other words I'm interested you know the process is up in memory go ahead and add the breakpoint and then the breakpoints hit at this address which is the address I showed you before and then I print out all the registers. Again just for you know dynamic analysis type thing you can now look at what the registers and what's interesting too is when all this code is executing it's all really down into the driver itself so I can actually change the registers themselves and I'll get into really what that all is. Again some other interesting points too that I'll mention here and we'll come back to later because we rootkit the system we're redirecting all the syscalls to ourselves into the Helican driver. So what's interesting there is that you can kind of actually mine the code first time without even running the driver to find all the return addresses because what I do is I print out all the return addresses for all the syscalls that way it kind of gives me a toehold real quick into the virtual address of where the code is being executed. What's interesting is if like there's some sort of JIT or encryption going or even packing for that matter when those syscalls actually gets executed you know I'm going to get control of that so interestingly enough I can dereference that whole memory in other words dump it after it's been decrypted and I don't have to sit and fight through keys or anything like that and trying to figure out how to brute force it. There are three things that you need for this driver to work. One is the breakpoint which you can see right here the BP1 and you can have as many as you want but it's that same address 80481D8. I need some sort of slack space where I'm going to put some handler code and a syscall address so what I do is like I said I'll run the malware through I'll go ahead and get all the different syscalls and get all their addresses so I know exactly where they're being executed in the code. And so where I find one that I'm of interest like you know that happens very early maybe like in startup code if you disassemble a Linux compiled program you're going to see that there's always some sort of boot loader code or startup code for CNN that type of stuff that happens early in the game so you can use that guy. Okay like I said you know you can use the UNAME and what's interesting too about the UNAME or anything in the startup code is that once execution actually has hit the syscall that you've hooked all the code above it is really slack space at that point right because it's never going to get executed again not least startup code because the entry point has already happened and it's gone down a little bit. And for really what we're trying or what we're doing for our handler code it's really we only need around 20 bytes so it's very very small footprint. Okay so what really happens in the Linux kernel when an interrupt happens or the syscalls right so you see the int 80 that's a way of an interrupt that goes and notifies the kernel that execution is going to go down to the kernel and handle the syscall. Okay so this is just the normal operation and you see that we're right above the int 80 you've got to move EAX and then 7A which is an identifier to let the kernel know what syscall it's actually using or going to call. So by hooking all the different syscalls I'm this picture is not exactly right but what happens is there's still an int 80 it still goes down to the kernel not right directly to our driver but in the end the end result is it does go to our driver before it really goes to the actual syscall of you name okay and then we have a choice we can actually execute it if we want to modify the parameters do whatever we want. In this particular case we actually want to then inject a breakpoint or jump somewhere in the process right. I'm not going to go into how rootkitting you know Linux it's not a very hard thing you can ask Google and he'll tell you all you need to know but I kind of put up a little slide here just kind of show you that you know for those that are not familiar with it there's a syscall table and there's these indexes and you see the you name here for example okay so what I'm doing is copying off a pointer address and saving it off and then what I do is I've got a hooked one and I'm going to replace it in the array so that again when that int 80 happens the kernel looks it up and says okay where do I call it's looking up a pointer address to go make that call to that function so now I'm saying give it to me. Now what's interesting here is right so we wrote these little macros that kind of tells us what if it's if a program is compiled statically or dynamically where what offset is the return address of the actual function call you can see right here right this this call you name has a return address because every time there's a call the return address gets pushed onto the stack so we can write this little predicate right here right that says you know if the the return address of the you name call is in this particular case it's the 8048322 go ahead and add a breakpoint so what's nice is you names getting called all the time right and as the kernel is swapping in and out of memory of different processes that are running and stuff they're getting called but you know as long as this predicate doesn't match with where we want to inject a jump it'll just continue on its merry way you can see right here we're actually calling the original one and going on. Okay so this is more for completeness I kind of put up what are all the different tasks that happen in what type of order but I decided to come up with the diagram and I think will help describe it a little bit better. You could see here there we've got the the different steps here one two and three four and all the way up to step 11 it's a little complicated but I mean now that we got it all put together it works well so you don't have to worry about that so we see that we've got the user mode here and in kernel mode okay and this this whole block really represents the maybe the malware or the process that we're interested in analyzing okay and so you got the startup code and in startup code it calls you name like I said it could be anything you want doesn't have to be you name but it just depends on what what what type of boot up code they're using and so this is the first step there's an in 80 it passes down to the kernel the kernel does some things in the end result because I've got a hooked I'll wind up getting to my hooked you name okay so the second step that I want to do is I want to steal the bytes of where I want to inject the jump because eventually I want to replace those bytes so I can actually execute that instruction right so I steal the bytes I store them off and then I go ahead and the third step I lost my mouse okay the third step right here so I'm going to go ahead and add the breakpoint so it compile or at runtime I'll actually figure out where I need to jump you know from this destination or this source address to the destination of where my slack space is going to be okay and so then I want to go ahead and add the handler code that's the fourth step right and so the really the handler code is a way of now I push all the registers so think of it this way let's back up a sec if as execution goes from you name and finally hits my breakpoint or my jump it's going to jump to slack space and so what do I want to do when what am I interested in right because at that moment before it was getting ready to execute that instruction it's replaced by a jump so everything's kind of live at that point right all the registers are live so I push them all onto the stack and I cause another interrupt to in other words notify my driver that something's happened and now I'm interested in looking at the registers changing the memory changing the registers etc so what I do is I hook another syscall called what is it called I don't think I have data sync I don't even know what it does and I don't care because again I've got a predicate in there and I'm just looking at whatever address because a compile time I know if you know the same thing I showed you in the you name if you're looking at that predicate you kind of know what's going on if you've hit some interesting code that's in your handler okay the other interesting thing that I do with the if data sync is that it takes an unsigned integer okay so you see where I'm pushing all the registers on the stack well the very last thing that I do is I look at the register that it's using for its unsigned int which is moved into ebx and it's live all the way down even after the context switch getting down to the driver so what I do is I move the stack pointer into ebx so that I know exactly where all the registers are so then I can you know muck with them and change them or whatever I want to do so again once you see here where this another int 80 happens is executions going right in here okay so step six goes down now we get control back to our helicon driver and then in this particular case I can now since I know exactly where the registers are I can dereference that the stack pointer at that location and I have all the registers I can print them out another interesting thing is I just leave them on the stack and I can change them on the stack so that when I actually return back to slack space I pop them all off so another effectively I'm changing the registers like I said on the stack and so when I pop them off and then jump back into the original code you know I've effectively changed the registers so anyways after I print all those guys out I want to eventually jump back to the original place that I wanted to analyze so I have to replace those stolen bytes and then return back to slack space sends me back to here and then the last instruction for the slack space is to go ahead and jump where I originally had a break point and so and that's pretty much it so after that point I've kind of looked and you saw I think up above right here where you know the registers and was able to see in other words an F data sync was hit and so I print out the break point address and what I did too this is kind of like what some of the output would look like from the driver I actually broke it down a little bit to see step by step kind of what's happening right so again it just as a rehash you see that the U name is called okay so now I'm interested and I've looked at this return address 804832 and then you see that this foo right here the 55 I want to go ahead and inject a jump there right so I calculated runtime and you see that the memory was 5989 which is right here and then I dereference it again now it's an E9 which again is beginning opcodes for a jump then I found some area in slack space for this particular example I picked an area that was just a C runtime function it's never called but again you could use the slack space that's all above the call to U name and you could see where the 5589 you know where the code originally was and that I inject and I've actually got it where it's just doing like a push D or something like that where it pushes all the registers on there but I did this for you know more for the display of making it easier to understand but basically see where I push all the registers it's come actually over here yeah so the jump it's now been executed and you can see right here where the F data sync so in other words I've you know hit my jump you know my rerouted code it jumps to my slack space and then I cause another F or another interrupt down to the helicon driver so that I know that and more importantly like I said I've moved ESP into EBX and then the 94 like I said is the identifier for the F data sync then the interrupt happens and then okay so at that point like I said I can change the registers in memory and then when I return back to slack space you see that now I'm popping them all off and then last thing I do is jump to where my original break point was in this case like said IDA pro finished it for me and put foo in there but it's where I replaced the stolen bytes and okay so I kind of mentioned this before where like you know the nice thing about you know looking at different sys calls and stuff is that you know maybe on file or whatever I can't see where you know because everything's packed or encrypted or whatever but you know by rootkitting the system and getting all the sys calls I can now see you know after it's been decrypted where all that stuff is unpacked and decrypted so I can easily dump it and I rate from my driver what the plain text or the code would be and if you wanted to loop rather than you know this mechanism of cleaning up or replacing the stolen bytes you know right after the second interrupt that I caused the F data sync I can actually just leave the jump there and down in my slack space go ahead and put the instruction that I stole and just execute it there if I wanted to as well it just kind of depends on what scenario that you want to look at whether it might be just looking for one or maybe the area that you want to see is inside some sort of a loop. I think with this type of mechanism too I haven't done it or anything but I could probably do like instruction tracing and just start off the very beginning where the entry point starts start rerouting every single instruction so the idea is once I replace those stolen bytes the next instruction I can look at and go ahead and inject a jump which again will go down to the slack space and start the whole thing all over again. And so you know again this took about a week to put together so it wasn't really very difficult to actually write the driver and I just thought it was kind of neat how you know exploiting the int 80 for you know for used in like I said for root kidding assist calls and stuff using that to the advantage of actually creating a little bit of a debugger that will allow you to look at memory and get around all the problems with malware and stuff like I said exploiting the fact that interrupt descriptor table handles the int 3's or you know single steps or in the E flags. These types of things that the debuggers have to operate under which makes sense because the majority of code that people are analyzing is not malicious and exploiting those types of weaknesses in a debugger. It causes us as a red team and reverse engineering team to you know after you bang your head enough times looking at a code that keeps thwarting you and like I said it's kind of fun for a while but then it starts really getting annoying so it kind of leverages itself to create your own debugger. And we have another debugger too that's also Linux too that we used I'm familiar with Quemu but we leveraged that and created a whole system emulation of Linux and Windows and we can debug either one. The problem that I see with using that one like it works on simple cases of malware and stuff but if there's any kind of hardware component that's added to it you have to emulate the hardware too so it becomes cumbersome for doing really quick type stuff or analyzing quick stuff so this kind of driver works out well for just getting a quick one day look at something and understanding what it does without getting caught. Last thing too is I've got some ideas on actually because the Slack space is the only thing that's been bothering me. It's a small footprint it's only about 20 bytes for the handler code I want to ideally after I've kind of analyzed some code you know down on my driver actually replace those stolen bytes because I can steal those off too as well and then replace them. One thing I haven't tried and the only idea I've got right now is when I'm actually down in the driver at the second interrupters replace those stolen bytes and then possibly jump right from the kernel space right into your space. I don't know if you can do that or not I haven't tried it. I know that you can't go from user space down into kernel space in Linux it doesn't like that so but I might be able to from kernel space jump right into user space. And since I you know the stack could be a little bit weird because there's a lot of different calls that happen in Linux OS to get you down into your actual driver you could probably just since you know the delta of where the original rerouted code was from when you push the stack pointer on you could just subtract or in this case because it's high to low you could add to the stack. In other words pop everything off the stack from that original jump. If anyone has a lot of Linux experience I'd like to talk with them on that if they have any ideas. So, another thing I'd like to add eventually is maybe some sort of good gooey front end type thing. Now the kind of the way it works now which is nice is there's no kind of timing checks because it all runs at run time it's very, very fast and so there's very little overhead that that delta's you know so even using the different type of timing checks we don't get caught. However, if I run some sort of front end where you have more of like a traditional debugger where it stops execution and then you've got a chance to analyze whatever the registers are and modify them right then and say next or step into or whatever. That's the only kind of caveat that I'm seeing right now is that then I would lose that feature of timing checks and stuff. In some cases where like I said you're hooking the entire syscall table so there's some calls that say different malware might make to check to see what the timing is and do another check later. Well obviously if you've hooked all those calls you can modify those return you know to make the delta much smaller. But there are also assembly instructions too that are not part of the syscall that will do timing checks to the checks for the timestamp counter so. That's it. So yep any questions? Yes. Yes, we actually have, we've done some work in detours as well as it's a ring three type rootkit I guess you know rootkitting or at least redirecting syscalls or DLL type calls. So yeah we do some windows work as well and so I was kind of, we've done like I said we've got detours we use that and that kind of gave some inspiration on writing this one. Except what I didn't, what I don't want like with detours for example is it actually creates some trampoline space or slack space which then is growing the size of the executable. Ultimately I don't want to do that because I haven't seen it yet but that's not to say that you know different protections couldn't detect that you're actually increasing the slack space of the process and if that were to happen you know so I'm always trying to think a little bit ahead of those guys and how do I make it. So that's why I'm saying I'm, my real problem right now is figuring out how to clean up that slack space completely so it's oblivious. So I won't get caught by the checksums. All right. Yes. Right that's. Right. Absolutely that's kind of like I said that's where I'm kind of at right now is the whole slack space part thing right because if I reroute every single instruction then I'm not going to get caught by a checksum right. However again that slack space is the part that's haunting me. It's a small footprint but it is there. So you either find those checksums and nullify them you know in other words get rid of them or I got to figure out a way to from my driver land go ahead and replace those stolen bytes then go ahead and jump back into the process right right from kernel land. So I got to figure out that mechanism I haven't had a chance to actually try it. So but I'm yes. Yes I checked that actually before I inject the jumps and all that type of stuff I make sure that hasn't been swapped out or. True. True yes and no I mean in the scenarios that we run under we have malware machines so we don't really care. Right I mean I'm not going into a system where you know everything's very sensitive right. That's the great thing about being reverse engineer and having your own lab is that you know all the rules are broken right. I'm allowed I've got complete access as it is right. I have root so I kind of do what I want. Well that's what I was talking about is I actually know right what the stack pointer is because I push that right I move it into ABX so I kind of know what it is and so what I can do is do an add and then that delta which is effectively popping everything off the stack. So I kind of got that figured out the hard part is you know can you just actually do a physical jump right into user space. I don't know if you can do that and I haven't had a chance to test it. Yes. Well actually what I was I guess alluding to before is that if I reroute every single instruction that's not the case. Well like I said I don't really care like if it's a push which is one by you know push EBP and then followed by some other instructions. I'm just copying five bytes off and replace it with a jump. So you know eventually I'm going to hit that jump right. When it happens it will go to slack space which I have the stolen bytes. The first thing that I do after the jump goes to slack space you know when I get the interrupt to myself I replace those stolen bytes so it doesn't matter what the size is right because it's going to put the fifty five right back followed by whatever instructions are after it. So then when I actually jump back from slack space when I hit the push. I'm not sure I follow I mean using the break point is what I'm trying to get around. Yes. Are you talking about creating a different type of interrupt and registering in it because one thing you could do is you could create your own opcode right and then just hook the IDD table and add your own and that's yet another method you could do right. And so the thing the checks that make you know looks for CCs and hardware registers and those types of stuff that's not going to affect you either right because you know that's something also that I haven't been considering and thinking about actually doing. Sure sure sure. I don't think that the malware authors are going to run out there and start worrying about me. You know so yeah that would be flattering but yes. When I do a two byte I do a five byte and so. I guess I don't follow. Oh I see I see I see yes that's a good question. I guess I don't do anything. In that particular case I you know a lot of the times when I'm analyzing code I guess that's the other thing I didn't get a real chance. When I look at dynamic code I use IDA pro mostly for like static analysis and getting dumps and I think the better you get at reverse engineering the less you need to rely on dynamic. However there's always some points in the code that you really do need to see what's in those registers and so a lot of times when I'm breaking something or looking at something I will identify quickly through static analysis some key points maybe that I want to analyze and so you know it might come down to me adding three break points total to break something. You know what I mean? That's true fair enough. Yep well the other thought that I had with the rerouting every single instruction like a data access type break point which is I'm rerouting them all and I'm not really looking at them all I'm doing is looking at all the registers and stuff to see if the instruction pointer might be getting close to that execution and then cause a break. So yes. But I'm not executing it from the slack space right? I'm going to replace it and then jump to the place that I originally had to jump. See what I'm saying I'm rerouting the instruction before it has a chance to execute or intercepting. Sure but it will be right back to where it was. I executed from the place that originally was. I guess not. Any other questions? Yeah well remember I said in the very beginning I need three different things right? I need the break point I need to know where you want to analyze I need to know some sort of slack space and I need to get some sort of a toe or yes you do but again you know this is not like a debugger that's sold and yeah it's something I use in the lab that doesn't take me long to compile it right? Say that again? Yes. Well I was saying here right if you wanted to do this is a little bit more complicated because now I have to have a little bit of a disassembler because I can I could just execute the code from my slack space if I wanted to be in some sort of a loop right so long as and then if there is some sort of relocation issues of relative type addresses I'd have to recalculate that right? But if it's a simple instruction and I can look at that very easily right is it a move EAX, EBX well I don't care where I execute that from right? But if it's something like a push of you know some constant or something then I have to recalculate where that might be. Yes it actually does because like I said I leave the slack space like I said there has to be a little bit of redesign because I really want to get rid of that slack space then my footprint or at least getting caught by checksums would be gone so yeah right now yes it does support multiple but what I can't do right is I can't have a break point you know on a push because I'm stealing the five bytes so I can't put a break point right directly under that. Does that make sense? So I have to make sure that there's enough space in between those guys to inject another jump so yes. Say again? I'm working on that. Is the source code available open source and I'm working on that right now so I would say contact me. I have my contact information. You might be able to give it out. We have done that in the past so because we don't sell tools we're more of a service base. We analyze your different types of protections or software and give you quick results back. Any other questions? Okay.