Optimizing Two-Dimensional Continuous Dynamic Programming for Cell Broadband Engine Processors
をテンプレートにして作成
[
トップ
] [
新規
|
一覧
|
単語検索
|
最終更新
|
ヘルプ
|
ログイン
]
開始行:
[[町野]]
Title: Optimizing Two-Dimensional Continuous Dynamic Programming for Cell Broadband Engine Processors.
Authors: Shin-ya Iwazaki, Yuichi Okuyama, Ken-ichi Kuroda, and other
1. Introduction
A Two-Dimensional Continuous Dynamic Programming (2DCDP) is a specialized DP matching method for an image recognition. There are 2 advantages of a 2DCDP. A 2DCDP is tolerant of skew and misalignment matching because of non-linear enlargement or reduction of one image. In addition, a 2DCDP automatically segments an object which are matched with an image. One of the expectations by a 2DCDP is an object tracking in a movie. However, a movie per second consists of 30 image frames, and it means that an image matching for each image frame must be done within 0.03 seconds if we consider a real-time object tracking. Modern computers which have a general purpose processors can not achieve in real-time.
A Cell Broadband Engine processor (Cell processor) is a heterogeneous multi-core processor developed by STI, an alliance of SONY, TOSHIBA, and IBM. A Cell processor has a potential of the performance for not only multimedia processing but also scientific computing. In addition, a Cell processor is included in a SONY PLAYSTATION3 (PS3) and the linux OS and the software development kit which runs on a PS3 is opened to the public. Therefore, we can obtain the high-performance computer in the lower price than other computers whose performance are same to PS3.
In this paper, we present our approach to optimizing a 2DCDP to achieve a real-time object tracking in a movie. We choose a PS3 which includes a Cell processor as a platform and optimize a 2DCDP problem for a PS3.
This paper is organized as follows;
2 2DCDP
2.1 An overview of 2DCDP
2DCDP is an extended algorithm of continuous dynamic programming (CDP) for two dimensional. The enlargement range is up to 0.5-2.0 and the rotation angle is +-45. 2DCDP consists of three part, a CDP for the column-direction which computes the non linear matching between the column directions of the reference image and the input image, a CDP for the row-direction which decides points of the most right row, and a backtrace which corresponds the points in the column-direction using the result of CDP for the column and row direction.
2.2 CDP for the column-direction
A CDP for the column-direction is an extended of a line-image CDP. The luminance between the input image U(k,l) and i column of the reference image Zi(j) is calculated by 輝度の式. It is calculated for seven local paths, and the path whose luminance is minimum is selected as the accumulative path Di(j,k,l). These operations are done for each column of the reference image.
2.3 CDP for the row-direction
A CDP for the row-direction is computed by the result of a CDP for the column-direction. For the 3D space which is a stack of Di(J,k,l) (0 <= i <= I), the accumulative paths DD(i,k,l) are computed by the local paths dd(i,k,l) which are the accumulative paths in a CDP for the column-direction. The minimum points of DD(I,k,l) will be the points in the most right row.
2.4 Backtrace
A CDP for the row-direction decides the points in the most right row. Based on the result, the accumulative paths is corresponds to the input image from right to left. The matching is realized by corresponding to all columns.
3 Cell processor
3.1 An overview of a Cell processor
The first generation Cell processor consists of a Power Processor Element (PPE) and eight Synergistic Processor Elements (SPEs). A PPE and an SPE are different types of an architecture. Although the performance of each processor element is not higher than recent general processor because each processor element does not have an out-of-order execution unit and a dynamic branch predictor, it is possible to get the high performance by vectorizing a program using SIMD instructions and parallelizing to all of SPEs. Each processing element and I/O devices are connected by a four-ring structured Element Interconnect Bus (EIB).
3.2 PPE
A PPE is based on 64-bit multi-threaded Power Architecture. A PPE runs an OS and organizes whole execution in SPEs.
3.3 SPE
An SPE consists of a Synergistic Processor Unit (SPU) and a Memory Follow Controller (MFC) and it is a different architecture from a PPE. An SPU contains 128 elements of 128-bit registers, and it can execute a vectorized program by using SIMD instructions. In addition, an SPU can issue a pair of an odd and even instructions at once. An SPU holds 256KB scratchpad memory called as a Local Store (LS) instead of a cache. Each SPU holds its own local address space and can not access to the main memory directly. An MFC provides a DMA transfer to send/receive the data between the main memory and the LS. An MFC also provides a mailbox and signal-notification for a communication between a PPE and an SPE or an SPE and another SPE.
3.4 I/O
The main memory adopts XDR DRAM and its bandwidth is 25.6GB/sec. The main memory is connected to Memory Interface Controller (MIC), and the MIC and other devices are connected with EIB.
3.5 An advantage and disadvantage of a Cell processor
Optimizing a program for a Cell processor is necessary to get a high-performance of a Cell processor. First, an original program is parallelized to SPEs, and it is vectorized by using SIMD instructions. On the other hands, an SPE does not have out-of-order execution unit and a dynamic branch predictor. With lack of these units in mind, we need to optimize a program; for example, eliminating a branch instruction and replace to compare and select instructions, etc.
3.6 PLAYSTATION3
PS3 is a video game machine which includes a Cell processor, but we can get some Linux operating systems which are designed for PS3 (Fedora, Yellow Dog, etc), and the software development kit is also opened to the public. A Cell processor in a PS3 consists of a PPE and eight SPEs, but the user can use a PPE and six SPEs.
4 Optimizing 2DCDP for a Cell processor
4.1 An overview of the optimization
We adopt a PS3 to optimize 2DCDP. A PS3 includes a Cell processor and 256MB main memory. The current implementation keeps all of the local distance and accumulative distance which are calculated in the CDP for the column-direction and CDP for the row-direction. However, the size of an LS in each SPE is limited to 256KB. In addition, the main memory in a PS3 is 256MB, but we can use about 180MB because the OS and other application use about 70MB of the main memory. Therefore, the size of all data must be kept below 180MB because the swap will be occured if the size is over the size of main memory.
Optimizing 2DCDP for a Cell processor, we divide 2DCDP into three modules. The first module is a role of a CDP for column-direction, the second module is a role of CDP for the row-direction, and the third module is a role of backtrace. First, a module of CDP for the column-direction runs on five SPEs and a module of CDP for the row-direction runs on an SPE, and these are executed in parallel. After the most right row is decided in a CDP for the row-direction, backtrace module is executed with six SPEs. In the implementation by Iwasa et al., double type is used for calculation of the luminance. However, an SPE can execute for only two of double data because an SPE execute for 128bytes data by SIMD instructions. Therefore, we convert all of double data to float data, and we can calculate for four data at once. In addition, the current implementation of 2DCDP includes many branches. We replace the branch instructions with the compare and select instructions. By eliminating the branch instructions, we can avoid the penalty caused by branch miss stall.
4.2 CDP for the column-direction module
4.2.1 Optimizing consumption of the memory
CDP for the column-direction loads a line of a reference image and calculates di(j,k,l). di(j-1,k,l) is required to calculate di(j,k,l), but other local paths are not required. Therefore, we holds the area for di(j,k,l) and di(j-1,k,l) in the LS. After calculating di(j,k,l), Di(j,k,l) is calculated. Only Di(j-1,k,l) is required to calculate Di(j,k,l) and other accumulative paths are not required to hold. In the case of 82x84 reference image and 136x200 input image, memory consumption for local paths and accumulative paths are reduced from 9MB to 0.22MB and it can be stored in the 256KB LS. However, all of the accumulative paths of CDP for the column-direction are required in the backtrace module. The current implementation allocates CPath array to save the accumulative paths. CPath is a set of char data and it shows the correspondence to Di(j-1,k,l). There are 7 types of local paths and these can be represented by 3-bit. Therefore, an element of CPath can represent 2 accumulative paths, and the memory consumption of CPath can be reduced from 187MB to 94MB in the case of 82x84 reference image and 136x200 input image. (ここに図を入れる). By the optimization shown above, the total consumption is reduced within 180MB.
4.2.2 Vectorizing by SIMD instructions
CDP for the column-direction calculates the luminance of local paths, and determines the accumulative path from 7 local paths. The first part of CDP for the column-direction calculates di(j,k,l), di(j,k,l+1),di(j,k,l+2),di(j,k,l+3) at once by using SIMD instructions. The next part select the optimal local path path from 7 local paths as an accumulative path. To select the optimal local path, compare and select instructions are used except of branch instruction to avoid the branch miss penalty.
4.2.3 Parallelizing with multiple SPEs
The order of CDP for the column-direction is O(IxJxKxL) and it is the maximum execution time of three modules. CDP for the column-direction is parallelized with five SPEs because another SPE will be used by CDP for the row-direction at the same time. The program is parallelized in loop i level because there is a data dependence between loop j and loop j-1. By the parallelization, the order of CDP for the column-direction per each SPE will decrease O(I/5xJxKxL)
4.3 CDP for the row-direction module
4.3.1 Optimizing consumption of the memory
CDP for the row-direction module calculates its accumulative path by using its local paths calculated in CDP for the column-direction. The result of CDP for the column-direction module (Di(J,k,l) to Di+4(J,k,l)) will be the local path(dd(i,k,l) to dd(i+4,k,l)) in CDP for the row-direction. Therefore, CDP for the row-direction module assigns four lines of dd. In addition, CDP for the row-direction is required to save all of the accumulative paths and its spot points which correspond to the most right row. Accumulative paths are recorded in RPath and the spot points are recorded in RSpot as same as CPath in CDP for the column-direction.
4.3.2 Vectorizing by SIMD instructions
In CDP for the row-direction module, the local paths are not required to calculate because the local paths depend on the accumulative paths in CDP for the column-direction. Therefore, CDP for the row-direction calculates the accumulative paths by using SIMD instructions.
4.3.3 Parallelizing with multiple SPEs
CDP for the row-direction runs with CDP for the column-direction in parallel. CDP for the column-direction runs on five SPEs, therefore, CDP for the row-direction runs on an SPE. During CDP for the row-direction calculates DD(i,k,l) to DD(i+4,k,l), CDP for the column-direction calculates Di+5(j,k,l) to Di+9(j,k,l).
4.4 Backtrace module
4.4.1 Optimizing consumption of the memory
Backtrace module calculates the correspond of a reference image to an input image. Using RPath and RSpot recorded in CDP for the row-direction and CPath recorded in CDP for the column-direction, backtrace module records the column-direction spots which corresponds to the reference image. The points of the spot are recorded in CSpot as same as RSpot.
4.4.2 Parallelizing with multiple SPEs.
Backtrace module is executed after CDP for the column-direction and CDP for the row-direction. Therefore, backtrace module can use all of six SPEs. Backtrace module is parallelized in loop i because there is a data dependence between loop j and loop j-1 as same as CDP for the column-direction. The order of the computation will be O(JxI/6) from O(JxI).
5. Evaluation
5.1 An overview of the evaluation
To evaluate the optimized 2DCDP for a Cell processor, we compare with the original implementation running on an Intel Pentium4 processor. We adopts the reference image and input image from Video Database for Evaluating Video Processing. The size of the reference image is 82x84 pixels and the size of the input image is 136x200 pixels. Therefore, I=84, J=82, K=136, and L=200.
6. Conclusion
In this paper, we optimized a 2DCDP which has been proposed as a simultaneous processing algorithm of seqmentation and recognition for a Cell processor. There are 2 main problems of a 2DCDP; one is a huge order of the computation and a huge amount of the memory consumption. For the huge order of the computation, we optimized by vectorizing by SIMD instructions, parallelizing with multiple SPEs, loop unrolling, eliminating branch instructions, double buffering of DMA transfers, etc. For the huge amount of the memory consumption, we optimized by reduction of recorded data and contriving the data structure.
An amount of char-type arrays which record accumulative paths is decreased from 187MB to 94MB by assign the information of 2 paths per one element. By this optimization, the total consumption of the memory is decreased from 200MB to 110MB. To calculate the local paths and accumulative paths in CDP for both column and row direction, we vectorized by SIMD instructions. We assigned five SPEs for a CDP for the column-direction module and an SPE for a CDP for the row-direction module, and they are executed in parallel. After CDP for both column and row direction modules, a backtrace module is executed in parallel with six SPEs. These optimization shown above decreases the execution time from 44.45 seconds to 1.97 seconds and 22.6 times faster than the original implementation.
However, it is strongly required for the object tracking to execute less than 0.03 seconds. This ideal execution time is 65.7 times faster than the current optimized implementation. 6x4=24倍で、22倍速くなっているんだからこれ以上の性能向上は見込めない。クラスタを組む。ソフトウェアによる動的分岐予測によって、分岐ペナルティを減らせばもっと速くなるかもしれない。などということを
終了行:
[[町野]]
Title: Optimizing Two-Dimensional Continuous Dynamic Programming for Cell Broadband Engine Processors.
Authors: Shin-ya Iwazaki, Yuichi Okuyama, Ken-ichi Kuroda, and other
1. Introduction
A Two-Dimensional Continuous Dynamic Programming (2DCDP) is a specialized DP matching method for an image recognition. There are 2 advantages of a 2DCDP. A 2DCDP is tolerant of skew and misalignment matching because of non-linear enlargement or reduction of one image. In addition, a 2DCDP automatically segments an object which are matched with an image. One of the expectations by a 2DCDP is an object tracking in a movie. However, a movie per second consists of 30 image frames, and it means that an image matching for each image frame must be done within 0.03 seconds if we consider a real-time object tracking. Modern computers which have a general purpose processors can not achieve in real-time.
A Cell Broadband Engine processor (Cell processor) is a heterogeneous multi-core processor developed by STI, an alliance of SONY, TOSHIBA, and IBM. A Cell processor has a potential of the performance for not only multimedia processing but also scientific computing. In addition, a Cell processor is included in a SONY PLAYSTATION3 (PS3) and the linux OS and the software development kit which runs on a PS3 is opened to the public. Therefore, we can obtain the high-performance computer in the lower price than other computers whose performance are same to PS3.
In this paper, we present our approach to optimizing a 2DCDP to achieve a real-time object tracking in a movie. We choose a PS3 which includes a Cell processor as a platform and optimize a 2DCDP problem for a PS3.
This paper is organized as follows;
2 2DCDP
2.1 An overview of 2DCDP
2DCDP is an extended algorithm of continuous dynamic programming (CDP) for two dimensional. The enlargement range is up to 0.5-2.0 and the rotation angle is +-45. 2DCDP consists of three part, a CDP for the column-direction which computes the non linear matching between the column directions of the reference image and the input image, a CDP for the row-direction which decides points of the most right row, and a backtrace which corresponds the points in the column-direction using the result of CDP for the column and row direction.
2.2 CDP for the column-direction
A CDP for the column-direction is an extended of a line-image CDP. The luminance between the input image U(k,l) and i column of the reference image Zi(j) is calculated by 輝度の式. It is calculated for seven local paths, and the path whose luminance is minimum is selected as the accumulative path Di(j,k,l). These operations are done for each column of the reference image.
2.3 CDP for the row-direction
A CDP for the row-direction is computed by the result of a CDP for the column-direction. For the 3D space which is a stack of Di(J,k,l) (0 <= i <= I), the accumulative paths DD(i,k,l) are computed by the local paths dd(i,k,l) which are the accumulative paths in a CDP for the column-direction. The minimum points of DD(I,k,l) will be the points in the most right row.
2.4 Backtrace
A CDP for the row-direction decides the points in the most right row. Based on the result, the accumulative paths is corresponds to the input image from right to left. The matching is realized by corresponding to all columns.
3 Cell processor
3.1 An overview of a Cell processor
The first generation Cell processor consists of a Power Processor Element (PPE) and eight Synergistic Processor Elements (SPEs). A PPE and an SPE are different types of an architecture. Although the performance of each processor element is not higher than recent general processor because each processor element does not have an out-of-order execution unit and a dynamic branch predictor, it is possible to get the high performance by vectorizing a program using SIMD instructions and parallelizing to all of SPEs. Each processing element and I/O devices are connected by a four-ring structured Element Interconnect Bus (EIB).
3.2 PPE
A PPE is based on 64-bit multi-threaded Power Architecture. A PPE runs an OS and organizes whole execution in SPEs.
3.3 SPE
An SPE consists of a Synergistic Processor Unit (SPU) and a Memory Follow Controller (MFC) and it is a different architecture from a PPE. An SPU contains 128 elements of 128-bit registers, and it can execute a vectorized program by using SIMD instructions. In addition, an SPU can issue a pair of an odd and even instructions at once. An SPU holds 256KB scratchpad memory called as a Local Store (LS) instead of a cache. Each SPU holds its own local address space and can not access to the main memory directly. An MFC provides a DMA transfer to send/receive the data between the main memory and the LS. An MFC also provides a mailbox and signal-notification for a communication between a PPE and an SPE or an SPE and another SPE.
3.4 I/O
The main memory adopts XDR DRAM and its bandwidth is 25.6GB/sec. The main memory is connected to Memory Interface Controller (MIC), and the MIC and other devices are connected with EIB.
3.5 An advantage and disadvantage of a Cell processor
Optimizing a program for a Cell processor is necessary to get a high-performance of a Cell processor. First, an original program is parallelized to SPEs, and it is vectorized by using SIMD instructions. On the other hands, an SPE does not have out-of-order execution unit and a dynamic branch predictor. With lack of these units in mind, we need to optimize a program; for example, eliminating a branch instruction and replace to compare and select instructions, etc.
3.6 PLAYSTATION3
PS3 is a video game machine which includes a Cell processor, but we can get some Linux operating systems which are designed for PS3 (Fedora, Yellow Dog, etc), and the software development kit is also opened to the public. A Cell processor in a PS3 consists of a PPE and eight SPEs, but the user can use a PPE and six SPEs.
4 Optimizing 2DCDP for a Cell processor
4.1 An overview of the optimization
We adopt a PS3 to optimize 2DCDP. A PS3 includes a Cell processor and 256MB main memory. The current implementation keeps all of the local distance and accumulative distance which are calculated in the CDP for the column-direction and CDP for the row-direction. However, the size of an LS in each SPE is limited to 256KB. In addition, the main memory in a PS3 is 256MB, but we can use about 180MB because the OS and other application use about 70MB of the main memory. Therefore, the size of all data must be kept below 180MB because the swap will be occured if the size is over the size of main memory.
Optimizing 2DCDP for a Cell processor, we divide 2DCDP into three modules. The first module is a role of a CDP for column-direction, the second module is a role of CDP for the row-direction, and the third module is a role of backtrace. First, a module of CDP for the column-direction runs on five SPEs and a module of CDP for the row-direction runs on an SPE, and these are executed in parallel. After the most right row is decided in a CDP for the row-direction, backtrace module is executed with six SPEs. In the implementation by Iwasa et al., double type is used for calculation of the luminance. However, an SPE can execute for only two of double data because an SPE execute for 128bytes data by SIMD instructions. Therefore, we convert all of double data to float data, and we can calculate for four data at once. In addition, the current implementation of 2DCDP includes many branches. We replace the branch instructions with the compare and select instructions. By eliminating the branch instructions, we can avoid the penalty caused by branch miss stall.
4.2 CDP for the column-direction module
4.2.1 Optimizing consumption of the memory
CDP for the column-direction loads a line of a reference image and calculates di(j,k,l). di(j-1,k,l) is required to calculate di(j,k,l), but other local paths are not required. Therefore, we holds the area for di(j,k,l) and di(j-1,k,l) in the LS. After calculating di(j,k,l), Di(j,k,l) is calculated. Only Di(j-1,k,l) is required to calculate Di(j,k,l) and other accumulative paths are not required to hold. In the case of 82x84 reference image and 136x200 input image, memory consumption for local paths and accumulative paths are reduced from 9MB to 0.22MB and it can be stored in the 256KB LS. However, all of the accumulative paths of CDP for the column-direction are required in the backtrace module. The current implementation allocates CPath array to save the accumulative paths. CPath is a set of char data and it shows the correspondence to Di(j-1,k,l). There are 7 types of local paths and these can be represented by 3-bit. Therefore, an element of CPath can represent 2 accumulative paths, and the memory consumption of CPath can be reduced from 187MB to 94MB in the case of 82x84 reference image and 136x200 input image. (ここに図を入れる). By the optimization shown above, the total consumption is reduced within 180MB.
4.2.2 Vectorizing by SIMD instructions
CDP for the column-direction calculates the luminance of local paths, and determines the accumulative path from 7 local paths. The first part of CDP for the column-direction calculates di(j,k,l), di(j,k,l+1),di(j,k,l+2),di(j,k,l+3) at once by using SIMD instructions. The next part select the optimal local path path from 7 local paths as an accumulative path. To select the optimal local path, compare and select instructions are used except of branch instruction to avoid the branch miss penalty.
4.2.3 Parallelizing with multiple SPEs
The order of CDP for the column-direction is O(IxJxKxL) and it is the maximum execution time of three modules. CDP for the column-direction is parallelized with five SPEs because another SPE will be used by CDP for the row-direction at the same time. The program is parallelized in loop i level because there is a data dependence between loop j and loop j-1. By the parallelization, the order of CDP for the column-direction per each SPE will decrease O(I/5xJxKxL)
4.3 CDP for the row-direction module
4.3.1 Optimizing consumption of the memory
CDP for the row-direction module calculates its accumulative path by using its local paths calculated in CDP for the column-direction. The result of CDP for the column-direction module (Di(J,k,l) to Di+4(J,k,l)) will be the local path(dd(i,k,l) to dd(i+4,k,l)) in CDP for the row-direction. Therefore, CDP for the row-direction module assigns four lines of dd. In addition, CDP for the row-direction is required to save all of the accumulative paths and its spot points which correspond to the most right row. Accumulative paths are recorded in RPath and the spot points are recorded in RSpot as same as CPath in CDP for the column-direction.
4.3.2 Vectorizing by SIMD instructions
In CDP for the row-direction module, the local paths are not required to calculate because the local paths depend on the accumulative paths in CDP for the column-direction. Therefore, CDP for the row-direction calculates the accumulative paths by using SIMD instructions.
4.3.3 Parallelizing with multiple SPEs
CDP for the row-direction runs with CDP for the column-direction in parallel. CDP for the column-direction runs on five SPEs, therefore, CDP for the row-direction runs on an SPE. During CDP for the row-direction calculates DD(i,k,l) to DD(i+4,k,l), CDP for the column-direction calculates Di+5(j,k,l) to Di+9(j,k,l).
4.4 Backtrace module
4.4.1 Optimizing consumption of the memory
Backtrace module calculates the correspond of a reference image to an input image. Using RPath and RSpot recorded in CDP for the row-direction and CPath recorded in CDP for the column-direction, backtrace module records the column-direction spots which corresponds to the reference image. The points of the spot are recorded in CSpot as same as RSpot.
4.4.2 Parallelizing with multiple SPEs.
Backtrace module is executed after CDP for the column-direction and CDP for the row-direction. Therefore, backtrace module can use all of six SPEs. Backtrace module is parallelized in loop i because there is a data dependence between loop j and loop j-1 as same as CDP for the column-direction. The order of the computation will be O(JxI/6) from O(JxI).
5. Evaluation
5.1 An overview of the evaluation
To evaluate the optimized 2DCDP for a Cell processor, we compare with the original implementation running on an Intel Pentium4 processor. We adopts the reference image and input image from Video Database for Evaluating Video Processing. The size of the reference image is 82x84 pixels and the size of the input image is 136x200 pixels. Therefore, I=84, J=82, K=136, and L=200.
6. Conclusion
In this paper, we optimized a 2DCDP which has been proposed as a simultaneous processing algorithm of seqmentation and recognition for a Cell processor. There are 2 main problems of a 2DCDP; one is a huge order of the computation and a huge amount of the memory consumption. For the huge order of the computation, we optimized by vectorizing by SIMD instructions, parallelizing with multiple SPEs, loop unrolling, eliminating branch instructions, double buffering of DMA transfers, etc. For the huge amount of the memory consumption, we optimized by reduction of recorded data and contriving the data structure.
An amount of char-type arrays which record accumulative paths is decreased from 187MB to 94MB by assign the information of 2 paths per one element. By this optimization, the total consumption of the memory is decreased from 200MB to 110MB. To calculate the local paths and accumulative paths in CDP for both column and row direction, we vectorized by SIMD instructions. We assigned five SPEs for a CDP for the column-direction module and an SPE for a CDP for the row-direction module, and they are executed in parallel. After CDP for both column and row direction modules, a backtrace module is executed in parallel with six SPEs. These optimization shown above decreases the execution time from 44.45 seconds to 1.97 seconds and 22.6 times faster than the original implementation.
However, it is strongly required for the object tracking to execute less than 0.03 seconds. This ideal execution time is 65.7 times faster than the current optimized implementation. 6x4=24倍で、22倍速くなっているんだからこれ以上の性能向上は見込めない。クラスタを組む。ソフトウェアによる動的分岐予測によって、分岐ペナルティを減らせばもっと速くなるかもしれない。などということを
ページ名: