C语言指针与地址

Wikipedia:Far pointer

For example, in an Intel 8086, as well as in later processors running 16-bit code, a far pointer has two parts: a 16-bit segment value and a 16-bit offset value. A linear address is obtained by shifting the binary segment value four times to the left, and then adding the offset value. Hence the effective address is 20 bits (actually 21-bit, which led to the address wraparound and the Gate A20). There can be up to 4096 different segment-offset address pairs pointing to one physical address. To compare two far pointers, they must first be converted (normalized) to their 20-bit linear representation. On C compilers targeting the 8086 processor family, far pointers were declared using a non-standard far qualifier. For example, char far *p; defined a far pointer to a char. The difficulty of normalizing far pointers could be avoided with the non-standard huge qualifier. Example of far pointer: #include<stdio.h> int main(){ char far *p =(char far *)0x55550005; char far *q =(char far *)0x53332225; *p = 80; (*p)++; printf("%d",*q); return 0; } Output of the following program: 81; Because both addresses point to same location. Physical Address = (value of segment register) * 0x10 + (value of offset). Location pointed to by pointer 'p' is : 0x5555 * 0x10 + 0x0005 = 0x55555 Location pointed to by pointer 'q' is : 0x5333 * 0x10 + 0x2225 = 0x55555 So, p and q both point to the same location 0x55555.

What are far pointers near pointers

"near" and "far" pointers are actually non-standard qualifiers that you'll find only on x86 systems. They reflect the odd segmentation architecture of Intel processors. In short, a near pointer is an offset only, which refers to an address in a known segment. A far pointer is a compound value, containing both a segment number and an offset into that segment. Segmentation still exists on Intel processors, but it is not used in any of the mainstream 32-bit operating systems developed for them, so you'll generally only find the "near" and "far" keywords in source code developed for Windows 3.x, MS-DOS, Xenix/80286, etc. A near pointer is a 16 bit pointer to an object contained in the current segment, be it code segment, data segment, stack segment, or extra segment. The compiler can generate code with a near pointer and does not have to concern itself with segment addressing, so using near pointers is fastest, and generates smallest code. The limitation is that you can only access 64kb of data at a time, because that is the size of a segment - 64kb. A near pointer contains only the 16 bit offset of the object within the currently selected segment. A far pointer is a 32 bit pointer to an object anywhere in memory. In order to use it, the compiler must allocate a segment register, load it with the segment portion of the pointer, and then reference memory using the offset portion of the pointer relative to the newly loaded segment register. This takes extra instructions and extra time, so it is the slowest and largest method of accessing memory, but it can access memory that is larger than 64kb, sometimes, such as when dealing with video memory, a needful thing. A far pointer contains a 16 bit segment part and a 16 bit offset part. Still, at any one instant of time, without "touching" segment registers, the program only has access to four 64kb chunks, or segments of memory. If there is a 100kb object involved, code will need to be written to consider its segmentation, even with far pointers.

C语言程序设计-现代方法

*问：指针总是和地址一样吗？答：通常是，但不总是。考虑用字而不是字节划分内存的计算机。字可以包含36位、60位，或者更多位。如果假设36位的字，那么内存将有如下的显示： Address Contents -------------------------------------- 0 |001010011001010011001010011001010011| -------------------------------------- 1 |001110101001110101001110101001110101| -------------------------------------- 2 |001110011001110011001110011001110011| -------------------------------------- 3 |001100001001100001001100001001100001| -------------------------------------- 4 |001101110001101110001101110001101110| -------------------------------------- | ... | -------------------------------------- n-1 |001000011001000011001000011001000011| -------------------------------------- 当用字划分内存时，每个字都有一个地址。通常整数占一个字长度，所以指向整数的指针可以就是一个地址。但是，字可以存储多于一个的字符。例如，36位的字可以存储6个6位的字符： ------------------------------------------- |001101|110001|101110|001101|110001|101110| ------------------------------------------- 或者4个9位的字符： ----------------------------------------- |001000011|001000011|001000011|001000011| ----------------------------------------- 由于这个原因，可能需要用不同于其他指针的格式存储指向字符的指针。指向字符的指针可以由地址（存储字符的字）加上一个小整数（字符在字内的位置）组成。在一些计算机上，指针可能是“偏移量”而不完全是地址。例如，Intel微处理器（用于IBM PC和其他产品）具有复杂的模式，模式中的地址有时用单独的16位数（偏移量）表示，有时用两个16位数（段：偏移量对）表示。偏移量不是真正的内存地址；CPU必须把它和存储在特殊寄存器中段的值联合起来。用于IBM PC家族的C语言编译器通过提供两种指针的方式处理Intel的分段结构：近指针（16位偏移量）和远指针（32位段：偏移量对）。由于这个原因，PC编译器通常保留单词near和far用于指针变量的声明。

学而不思则罔

可以看出上面说到的指针都是相对于实模式下的，那么保护模式下呢，个人理解也是偏移量。下面举个例子测试： int a=0; int *p; int main() { p=&a; *p=1; p=(int *)0x00123456; *p=1; return 0; } main(): 0040138c: push %ebp 0040138d: mov %esp,%ebp 0040138f: and $0xfffffff0,%esp 4 { 00401392: call 0x4018e4 <__main> 5 p=&a; 00401397: movl $0x405020,0x405024 6 *p=1; 004013a1: mov 0x405024,%eax 004013a6: movl $0x1,(%eax) 7 p=(int *)0x00123456; 004013ac: movl $0x123456,0x405024 8 *p=1; 004013b6: mov 0x405024,%eax 004013bb: movl $0x1,(%eax) 9 return 0; 004013c1: mov $0x0,%eax 10 } 第二段代码为第一段代码对应的汇编代码（采用mingw编译器）,可以看出无论是通过&取地址符还是直接指定地址0x00123456，都是偏移量(因为访问变量时并没有指定DS寄存器，也就是访问变量时使用的是程序装入时的段地址，而0x00123456作为偏移量)，而不是线性地址。可以把指针看做链接地址，在arm中bootloader装载内核内核的装载地址必须与链接地址相同，否则出错，而x86下bootloader装载内核的转载地址与链接地址不同可以通过段寄存器重定位实现运行。其实段：偏移量的形式是x86下的概念，而arm下根本就没有这种概念。虽然80x86微处理器中的分段鼓励程序员把他们的程序划分成逻辑上相关的段，然而很多操作系统已非常有限的方式使用段。实际上分段和分页在某种程度上有点多余。 2.6版的Linux只有在80x86架构下才需要使用分段，言外之意是arm架构根本就没有CS、DS分段的概念。从上面的角度看，指针是应该丢弃与段相关的概念，仅仅是偏移量。