So I'll be talking about methods for analyzing target attacks with Office documents. Okay, so we'll start off with an introduction to target attacks, what they are and how they work and how people use them. And then I'll talk about the Office binary file format, which we released to the public like a couple months ago. Then I'll talk about the type of bugs that we have in Office that people exploit. Then we'll talk about the defensive mechanisms, what you can use to, you know, not get owned. And then I'll talk about the exploit structures, like how the exploits work. And then we'll talk about how to analyze the exploits. And finally, we'll talk about AV slash IDS detection for these things. Alright, I'm going to go through this pretty fast because it's lunchtime like after this, so I'm sure we're all looking forward to it. And I'm very tired. Yeah. So target attacks, so these are pretty much, you have an adversary, they have a target. Typically the target has some type of resources that they need access to. And they, and usually the target is well, fairly well protected, the network perimeter. So what they do is they'll attempt some client side exploits. So what they do is they build some office document, Adobe PDF, whatever, they'll send it in to this individual that they're targeting. So they know ahead of time who is going to be the victim. And most of the time they know what this person does, what kind of work he does, and even the people that report to him. So it means a lot of reconnaissance work that they've done. And most of them just move the email so it looks like it's coming from your coworker. And it's very effective. Bypass is like probably all, most of the perimeter security devices, so firewall, IDS, IPS, most of them don't detect this stuff. And so it goes through. And most AV engines can't detect it either. And there's no public information on this topic for some weird reason. The media talks about it, but they never go into details. So I'll tell you today. All right. So office file format. So all the office file formats starting from two, office sets in three below follows this structure storage format. And this is not only used in Office. This is used in a whole bunch of other, like MSIs are actually all the SS files. And it's literally a file system within a binary file. And you have two units. You have storages and streams, and storage is like a directory, and stream is like a file. And the spec for this is pretty small, it's 12 page. And then each of these applications, for example PowerPoint or Excel, they'll store their data inside, you know, specific streams. And each stream has a name. And it can be very frustrating to parse manually, so don't do it unless you have to. Yeah, it's very frustrating. Instead, we have COM APIs that will do this for you. So I gave here some of the functions that you will need. So all you do is you call SDG open storage on the file, and you know, you tell what kind of permissions you want, read, write, access, blah, blah, blah, blah. And then you get back an iStorage object, and then you enumerate through all the elements, which basically enumerates through the storages and streams. And then you pick, when you find a stream that you want, you just call openStream on it, and it will return an iStream object, and then you call, you know, stat on it, and you get the file, the stream line. And you can do read, and then after that you can do read, write, to that stream, whatever. So it's very similar to, you know, open fstat, read, blah, blah, blah. Here is some code example of how to do that. This is somewhat compile-able. I think I stripped out some stuff to make this fit, but this will work. Or you can do, if you don't know C or C++, or you don't want to work with Win32 API, just use Python. You see, if you use the Win32 com wrapper from Python, it does the same thing. All it does is just wraps our Win32 API. And it's actually very easy to use, as you can see here. And then you get all the facilities that Python gives you. It's very good for experiments. If you want to write a fuzzer or something like that, it's pretty easy. All right. So I'll give an example here for PowerPoint. So PowerPoint stores all this data, for example, like the slide content in a stream called PowerPoint document stream. So if you literally, if you call enum elements on this, you'll see a stream called PowerPoint document, and it's in Unicode, so you'll see P, then 0, 0, 0, blah, blah, blah. And then PowerPoint internally has a data structure, has two data structures. One is a container, which is also analogous to a directory. And it contains an enum, and enum is a file, and a container can contain files, or a container can contain other containers, and so on. And the records follow the TLV format, TLV style. MSO objects follow the same format. MSO objects are like these charts, not charts, but these GUI components you can put into Office, and this is MSO.DLL, if you think about it. And then there's a structure to define the PowerPoint, which is, you know, I give you the information there. And based on the way you recognize if a record is a container or an atom is based on the recver, if it's OXF, then it's a container, else it's an atom. The type refers to the record type, and the record type is what we release to the public. Like if the record type is 1000, what is that? And you go up, you download the file spec, you say, okay, scroll down, type 1000, what is it? And then we'll give you what exactly the data structure for that record is. And there are about 100 of these. Excel has quite a few, and MS Word has quite a few as well. But PowerPoint is very simple, only 100 of itself. And I drew this so it's easier to understand. Here is another one. Can you guys see it pretty well? Yeah, this is an abstract view of the... When you open up the stream and you read the buffer, that's what it will look like. And that's how you parse it. You have to do it recursively because the containers can contain other containers. But given that, it should be pretty simple. Here is... So if you were to parse this, this is what it will look like. This is a real PowerPoint document, and the first record that you'll see when you read it from it, it's gonna be a document container, which is type OX3EA. And then that container will contain an atom of type OX3E9, as I showed here, and so on and so on. And you notice at the bottom there's a slide container. So all the content for a slide, for example, the bullets, the attacks, that is stored for each slide. It's gonna be a slide container. So if you had like 100 slides, there'll be 100 slide containers. For Excel, the data is stored inside the workbook stream, and it doesn't have containers or atoms. It just has false BIF record, which is very similar to... It's still TLV, and BIF records can have... Can only be up to 8,000 bytes or so. If it's bigger than that, then you need to use another record, a BIF record type of continue that says, hey, there might be more data coming up, and then you follow it. So it's pretty simple. Bugs. The bugs in Office are, I mean, just like any other apps, it's nothing really out of the ordinary. You still have, you know, the standard stack, heap overflow, uninitialized variables, bad pointer reuse, double freeze. I mean, a whole bunch of these is very common. Nothing special. Here is... I'm gonna do a demo. This is a pretty innocent sample. I mean, innocent demo, but in real life, what happens is the attacker will perform reconnaissance on the target first. So they know... Like, for example, if they know that some of you are going... If they were to attack me, they would know that, oh, I'm going to recon today, and I would probably want to look at the recon schedule. So what they'll do is they'll craft up the recon schedule in a Word document, they'll send it to me, and it says from Hugo, it says, hey, Bruce, check out the updated recon schedule. And I would open up, you know, blindly, and then I get out. So... Sorry, my virtual PC is a little bit slow, so... There you go. So if you notice, there was a little bit of lag there. Like, I actually started it, then I exited it, and then I relaunched it again. So, anyways. That's what it will look like on a real... On a faster computer. I mean, this stuff would be... It would be very fast. You won't even see the application quitting. It would be like just like opening a normal document. All right. The defensive mechanisms are if you use mollis, it's a free tool that we wrote in response to these things. It records Office 2003, and basically what it does is it only works on LA SS files. And what it does is it converts your binary file format into the new Office 2000 file format, and it opens it up. So you won't see, you know, even if there were vulns, it won't be there. It'll just crash. Or you can use... If you have Office 2003, go install Office 2003 SP3, which we released, I think in September or December. I can't remember which one. It's also free, and this is a major security push. We fixed a whole bunch of bugs here, and the fact is, if you had Office 2003 SP3, then none of the zero days that were, you know, released or patched since it's released, you won't be vulnerable to them, because none of them affect 2K3 SP3. In fact, there are very few bugs that affect 2K3 SP3 so far. So for your own safety, if you can, go install SP3. And together, SP3 and mollis actually eliminates probably, I want to say 100%, but you know, you never say never, right? So about 99% of them, in fact, I don't think I've ever seen one that doesn't work, but I always want to make sure. And always have the latest patches. Or you can use Office 2007. It's pretty cheap, actually. You buy the home edition, like $60 or something like that. All right, the export structure. It's nothing special. The basic structure is going to set up shell code, then you're going to have some malware that's embedded sometimes, and then you have a clean document which opens up, and then I'll talk about the techniques. All right, here's another picture that I drew. So the picture is actually the complete Office document, and I guess a structured view of an Office document. You have an OASS header, and then some Office records, and then you have some shell code, and then you have some more Office records, and then you have a Trojan or malware. Sometimes you have more than two. And then you have a clean document, which is what they use to reopen. And then you have, you know, some which is like the OASS stuff. And there can be variations of this, but the green parts, they don't change often. Because, for example, if it's not an OASS file, obviously Office is not going to open it. Or if you have too many corrupted Office data structures, Office sometimes will pick this out and then say, oh, your file is corrupted and not open it. And most of the time you'll see that the clean document and the Trojan is going to be obfuscated, so you can't just look up in a hex header and say, oh, yeah, that's a P header right there and, oh, that's going to be my Trojan. It's not that easy. And they rarely use a URL model. For example, everything is inside one document. They don't ever call back home to download a new set of Trojan. It's always, the Trojan is delivered along with the document. This is very different from the other client-side exploits. For example, the stuff that can be done through JavaScript. Most of the stuff, if you look at it, the ActiveX stuff, in the shell code, what they'll do is they'll just call a URL download, like a download file, A or something like that, and then to pick up the Trojan and then call WinExec on it. They rarely do it for Office documents. The techniques. It's standard GetEIP. And then sometimes they have custom encoders. In fact, about 99% of the time. And then some of the other techniques to use file handle brute force and application relaunch. I'm going to talk about them in a bit. So why do they brute force a file handle? When the shell code runs, what it'll do is it'll brute force all the file handles, then it will compare, it'll call getFileSize on each of the file handles and then compare that with a known size. And if it matches, it says, ah, yeah, I found myself. And then when it matches, it seeks to certain offsets to extract the Trojan or the clean document. And why do they do this? I mean, there are a whole bunch of other ways to find yourself in memory, right? You can imagine there are a lot of ways. But this is the easiest method and it's the most effective and it's the shortest method. You can do this in assembly in probably like 9, 10 instructions. All right. So here's what it does. Yeah. Shell code runs, decodes itself, builds up a list of function pointers it needs. Create file, set file pointer, write file, WinExec, create thread. Yeah. Sometimes get temp path or something like that to get your temp directory. We said that's what it'll do. It'll extract the Trojan and then write it to the temp directory. Pardon? Am I, I'm not, oh yeah. So much you have told me this. There we go. Thank God I didn't have like notes on there. Anyways, yeah. So basically, so after it finds the shell code decodes itself, builds up this list of function pointers that it needs and then it finds itself in memory, basically go through the file handle brute forcing method and then it seeks, calls step file pointer, seeks to a specific position that it knows that that's the beginning of its Trojan and reads that buffer and then writes it to disk and then call WinExec on it. And then it reads the clean document using the same method and then it writes it to disk and then call, literally call, call office again to open that file which is why you see when I open it up it quits and then reopens again. That's the result of that. So pretty interesting. You can apply this method to any other exploits that are in binary file like PDF or even like MPEG files. It's a very effective method. Okay, the analysis techniques. Tools you need, just a new hex editor, disassembler and optional if you have a debugger, WinDBG is good. The objective is one, you need to identify the shell code, understand it, extract the malicious components and if you have the time to identify the exact vulnerability that they're trying to exploit. Okay, so I don't have any assembly, I just have opcodes. So earlier, yeah, this morning, yeah, my friend Pete here recognized I think all of these except for two. I don't know if any of you recognized this stuff. Anyone besides like the obvious ones? This is recon, I mean this is like, this is what people read for a living, right? Anyway, so fine, I'll just say it. The EB is going to be a short jump. The E8 is calling backwards. E2FA, that's a loop and then you'll see 33C9, that's XOR, ECX, ECX. So you can see that they're calling in a, this is standard getEIP technique. 6481300, that's, anybody know what the 64 prefix is for? No one knows. Yes, yeah, it's for the, so any instruction involving a DFS segment is going to have to do the 64 prefix. And 648 is to move FS30 into the EAX register. If it's EBX or ECX, it's going to be like 8B1D or 1E or something like that. And what about the last one, D97424F4? No one knows, okay. You guys are not very excited. I'll probably do this one time soon, so. Okay, so D97424 is a floating point instruction, FNSTNV. It writes the current FPU control environment to memory location and just one of the data blocks is written, it just happens to be EIP. That's all. All right, so the other one is, so how are you going to debug this stuff? All right, you obviously can't open up a doc or XOR file in a debugger. So what you have to do is be a bit more creative. So, method one, you identify the show, you open up your hex editor. Oh yeah, I forgot to go back here. If you don't know any op code at all, but if you load the file up in IDA in binary mode, sometimes, but not all the time, sometimes IDA will actually pick up the show code because it has, apparently IDA has some heuristics to pick out like push EBP move, ESP EBP and stuff like that. So it actually picks up some of that stuff. But sometimes it doesn't work. Maybe OFAC can add more heuristics or something like that. That's a feature request by the way, OFAC. All right, so method one is you open up in the hex editor, you go, you find the block that has, you suspect to be show code and you patch the first few bytes with OXCC and then you start up Office and attach WinDBG to it. If you did it right, then you hit in a breakpoint. If you did it wrong, then you probably got owned because the show code ran and you didn't know it. Yeah, so I mean the way I usually do it is to make sure is I open up the file in IDA first and then actually go to the parts that I suspect to be op codes. I press C and if it disassembles and it looks likely, then I patch it and then let's see what happens. Yeah, sometimes I fail though. Method two is pick any executable and you copy the show code to it and then set EIP then single step just like any other executable. This is actually a pretty common method. Most AV analysts use this. All right, method three is use WinDBG. What you do is you save the entire Word document in a temporary file like a bin file or whatever. You open up notepad or any other EXE, attach WinDBG to it and then you call DV a lot. Anybody use WinDBG here? Lots of people. Anybody knows what DV a lot does? Besides Nathan, yeah. So what DV a lot does is allocate a block of memory with read write and execute permission and then read mem is kind of obvious. It reads the content of a file into a block of memory. So what you do is you figure out how large this file is, you allocate a block of memory to store that and you read the file into that block and then you figure out what the show code is and you set the EIP to that and then you can single step. So it's very effective. You can actually, so you don't need Office at all. And this is, yeah, if you have WinDBG this is probably the best way. So here is an example. That's what I did. You probably won't be able to see all of it. All right, the method four. This is the best one. You do the same thing except for don't open like notepad, don't open like calc.exe but open Vim or notepad or anything that opens a file. And the reason why we do this is because when the show code runs, it will do a brute force on the file handle and then you get a match, right? Because you use like notepad or whatever and you open this file. So you have a file, the process will have a file handle and the files will be the same. And it will work. And you can actually reproduce, repro the entire, run the entire show code like this and you have, you'll see a progen execute and stuff without Office at all. Detection mechanisms. Okay, this is mostly for the AV and IDS people. You need to be able to, since to reduce false positives, you need to be able to figure out if a file is OASS or not. You can do that by just looking at the first eight bytes, the OCF11E0. And if you have those, the first eight bytes, although it's very likely that it's an OASS file. And then what you have to do is you have to receive the entire content of this file and then you use, you have to enumerate through all these streams and storages and figure out what type of files you have. You have a Word document, Excel document or PowerPoint document. And then you read the content and then you parse it as I showed you earlier. And then if you have time, you have to figure out what records are affected. So basically if you do generic detection, you get a lot of false positives, right? In order to get good, like very accurate detection, you need to figure out what BIF records or what PowerPoint containers slash atoms are affected so that you can detect the bad values. So be more accurate and not have false FPs. We released the Office file format. So it's online. You can go download it. It's available in PDF and in another format, in XPS. And given all the information I gave here and those pictures that I drew and the documents that we have online, you can actually write your own Excel and PowerPoint parser very easily. And if you write a good one, let me know. And we, since sometimes it's very difficult for people to, like for example when people will do a BIN diff on Office fixes, but most of the time they won't know what records are affected. They just know, oh yeah, that's an integer overflow here. But they won't know, they won't know exactly how to make a file that will repro that issue. We'll give more information about this at Black Hat Vegas. And this is the team that I work out working with, SWI. We do, so all the bugs that are reported to Microsoft gets assigned to one of us. And we triage it and we make sure that the bugs, the fixes are okay. And we have a blog which we talk about vulnerability details. Sometimes we give you, okay, we tell you exactly what the vulnerability is. And it's actually written by people who worked on these bugs. So it's very, I guess you can consider it authoritative. If you want more information or like if you want to see more, what are more information about the bones, send us an email. If you have a bug to report, tend to us, secure at Microsoft.com. And my email, if you have like malicious Office document or whatever, or whatever you want to share, tend to me, bdaatmrks.com. And that is it. So, Ilfac must get quite a few of these. The people send him IDB files and stuff, so pray that there's no bugs in the IDA. Yeah. Any questions? Oh, yeah. I have a question. How do you know or inform people that what the product will be or what the product will be? Yeah, you can write it in virtual PC. I'm just kidding. No, there are like light weight sandboxes. I mean, they're not made by us, so I mean, I don't particularly endorse them. Oh, not currently, no. But maybe sometimes soon. There's like an app called Sandboxy. I think most, I don't know if anybody's heard of it, but it's pretty effective. You open up this, basically it's a light weight VM. You open it up, all the files are created or like registry keys are changed. It just logs them and then it tells you that, it never commits them to disk, but it just tells you that, oh, hey, you know, you open up Office and there's like rootkit.exe running, so. And you can stop and say that, oh, that's a bad Office document and not open it again. Is that it? All right, thanks for your time.