본문 바로가기

Python_Intermediate/Pandas

[EXCEL]Excel File Data Analysis(엑셀 파일 Pandas 분석)

1. Sample Data

mpg.xlsx
0.02MB

 

2. 데이터 작업 절차

3. 사용 모듈

from print_df import print_df
from pandas import ExcelFile
from pandas import DataFrame
from matplotlib import pyplot as plt
import numpy as np

 

4. 데이터 수집

# 엑셀파일 읽기
xls_file = ExcelFile('C:\\python_StudyGroup\\200215\\data\\mpg.xlsx')

# 엑셀의 sheet 이름들 중에서 0번째 sheet를 dataframe으로 변환
df = xls_file.parse(xls_file.sheet_names[0], index_col=0)
print_df(df.head())

> 결과값

<class 'pandas.core.frame.DataFrame'>
(5, 11)
+---+--------------+-------+-------+------+-----+------------+-----+-----+-----+----+---------+
|   | manufacturer | model | displ | year | cyl |   trans    | drv | cty | hwy | fl |  class  |
+---+--------------+-------+-------+------+-----+------------+-----+-----+-----+----+---------+
| 1 |     audi     |   a4  |  1.8  | 1999 |  4  |  auto(l5)  |  f  |  18 |  29 | p  | compact |
| 2 |     audi     |   a4  |  1.8  | 1999 |  4  | manual(m5) |  f  |  21 |  29 | p  | compact |
| 3 |     audi     |   a4  |  2.0  | 2008 |  4  | manual(m6) |  f  |  20 |  31 | p  | compact |
| 4 |     audi     |   a4  |  2.0  | 2008 |  4  |  auto(av)  |  f  |  21 |  30 | p  | compact |
| 5 |     audi     |   a4  |  2.8  | 1999 |  6  |  auto(l5)  |  f  |  16 |  26 | p  | compact |
+---+--------------+-------+-------+------+-----+------------+-----+-----+-----+----+---------+

 

5. 데이터 전처리

# 각 컬럼명 변경
df.rename(
    columns={
        'manufacturer': '제조회사', 'model': '모델명', 'displ': '배기량', 'year': '생산년도', 'cyl': '실린더개수', 'trans': '변속기종류', 'drv': '구동방식', 'cty': '도시연비', 'hwy': '고속도로연비', 'fl': '연료종류', 'class': '자동자종류'
    }, inplace=True
)
print_df(df)

> 결과값

+-----+------------+------------------------+--------+----------+------------+------------+----------+----------+--------------+----------+------------+
|     |  제조회사  |         모델명         | 배기량 | 생산년도 | 실린더개수 | 변속기종류 | 구동방식 | 도시연비 | 고속도로연비 | 연료종류 | 자동자종류 |
+-----+------------+------------------------+--------+----------+------------+------------+----------+----------+--------------+----------+------------+
|  1  |    audi    |           a4           |  1.8   |   1999   |     4      |  auto(l5)  |    f     |    18    |      29      |    p     |  compact   |
|  2  |    audi    |           a4           |  1.8   |   1999   |     4      | manual(m5) |    f     |    21    |      29      |    p     |  compact   |
|  3  |    audi    |           a4           |  2.0   |   2008   |     4      | manual(m6) |    f     |    20    |      31      |    p     |  compact   |
|  4  |    audi    |           a4           |  2.0   |   2008   |     4      |  auto(av)  |    f     |    21    |      30      |    p     |  compact   |
|  5  |    audi    |           a4           |  2.8   |   1999   |     6      |  auto(l5)  |    f     |    16    |      26      |    p     |  compact   |
|  6  |    audi    |           a4           |  2.8   |   1999   |     6      | manual(m5) |    f     |    18    |      26      |    p     |  compact   |
|  7  |    audi    |           a4           |  3.1   |   2008   |     6      |  auto(av)  |    f     |    18    |      27      |    p     |  compact   |
|  8  |    audi    |       a4 quattro       |  1.8   |   1999   |     4      | manual(m5) |    4     |    18    |      26      |    p     |  compact   |
|  9  |    audi    |       a4 quattro       |  1.8   |   1999   |     4      |  auto(l5)  |    4     |    16    |      25      |    p     |  compact   |
|  10 |    audi    |       a4 quattro       |  2.0   |   2008   |     4      | manual(m6) |    4     |    20    |      28      |    p     |  compact   |
|  11 |    audi    |       a4 quattro       |  2.0   |   2008   |     4      |  auto(s6)  |    4     |    19    |      27      |    p     |  compact   |
|  12 |    audi    |       a4 quattro       |  2.8   |   1999   |     6      |  auto(l5)  |    4     |    15    |      25      |    p     |  compact   |
|  13 |    audi    |       a4 quattro       |  2.8   |   1999   |     6      | manual(m5) |    4     |    17    |      25      |    p     |  compact   |
|  14 |    audi    |       a4 quattro       |  3.1   |   2008   |     6      |  auto(s6)  |    4     |    17    |      25      |    p     |  compact   |
|  15 |    audi    |       a4 quattro       |  3.1   |   2008   |     6      | manual(m6) |    4     |    15    |      25      |    p     |  compact   |
|  16 |    audi    |       a6 quattro       |  2.8   |   1999   |     6      |  auto(l5)  |    4     |    15    |      24      |    p     |  midsize   |
|  17 |    audi    |       a6 quattro       |  3.1   |   2008   |     6      |  auto(s6)  |    4     |    17    |      25      |    p     |  midsize   |
|  18 |    audi    |       a6 quattro       |  4.2   |   2008   |     8      |  auto(s6)  |    4     |    16    |      23      |    p     |  midsize   |
|  19 | chevrolet  |   c1500 suburban 2wd   |  5.3   |   2008   |     8      |  auto(l4)  |    r     |    14    |      20      |    r     |    suv     |
|  20 | chevrolet  |   c1500 suburban 2wd   |  5.3   |   2008   |     8      |  auto(l4)  |    r     |    11    |      15      |    e     |    suv     |
|  21 | chevrolet  |   c1500 suburban 2wd   |  5.3   |   2008   |     8      |  auto(l4)  |    r     |    14    |      20      |    r     |    suv     |
|  22 | chevrolet  |   c1500 suburban 2wd   |  5.7   |   1999   |     8      |  auto(l4)  |    r     |    13    |      17      |    r     |    suv     |
|  23 | chevrolet  |   c1500 suburban 2wd   |  6.0   |   2008   |     8      |  auto(l4)  |    r     |    12    |      17      |    r     |    suv     |
|  24 | chevrolet  |        corvette        |  5.7   |   1999   |     8      | manual(m6) |    r     |    16    |      26      |    p     |  2seater   |
|  25 | chevrolet  |        corvette        |  5.7   |   1999   |     8      |  auto(l4)  |    r     |    15    |      23      |    p     |  2seater   |
|  26 | chevrolet  |        corvette        |  6.2   |   2008   |     8      | manual(m6) |    r     |    16    |      26      |    p     |  2seater   |
|  27 | chevrolet  |        corvette        |  6.2   |   2008   |     8      |  auto(s6)  |    r     |    15    |      25      |    p     |  2seater   |
|  28 | chevrolet  |        corvette        |  7.0   |   2008   |     8      | manual(m6) |    r     |    15    |      24      |    p     |  2seater   |
|  29 | chevrolet  |    k1500 tahoe 4wd     |  5.3   |   2008   |     8      |  auto(l4)  |    4     |    14    |      19      |    r     |    suv     |
|  30 | chevrolet  |    k1500 tahoe 4wd     |  5.3   |   2008   |     8      |  auto(l4)  |    4     |    11    |      14      |    e     |    suv     |
|  31 | chevrolet  |    k1500 tahoe 4wd     |  5.7   |   1999   |     8      |  auto(l4)  |    4     |    11    |      15      |    r     |    suv     |
|  32 | chevrolet  |    k1500 tahoe 4wd     |  6.5   |   1999   |     8      |  auto(l4)  |    4     |    14    |      17      |    d     |    suv     |
|  33 | chevrolet  |         malibu         |  2.4   |   1999   |     4      |  auto(l4)  |    f     |    19    |      27      |    r     |  midsize   |
|  34 | chevrolet  |         malibu         |  2.4   |   2008   |     4      |  auto(l4)  |    f     |    22    |      30      |    r     |  midsize   |
|  35 | chevrolet  |         malibu         |  3.1   |   1999   |     6      |  auto(l4)  |    f     |    18    |      26      |    r     |  midsize   |
|  36 | chevrolet  |         malibu         |  3.5   |   2008   |     6      |  auto(l4)  |    f     |    18    |      29      |    r     |  midsize   |
|  37 | chevrolet  |         malibu         |  3.6   |   2008   |     6      |  auto(s6)  |    f     |    17    |      26      |    r     |  midsize   |
|  38 |   dodge    |      caravan 2wd       |  2.4   |   1999   |     4      |  auto(l3)  |    f     |    18    |      24      |    r     |  minivan   |
|  39 |   dodge    |      caravan 2wd       |  3.0   |   1999   |     6      |  auto(l4)  |    f     |    17    |      24      |    r     |  minivan   |
|  40 |   dodge    |      caravan 2wd       |  3.3   |   1999   |     6      |  auto(l4)  |    f     |    16    |      22      |    r     |  minivan   |
|  41 |   dodge    |      caravan 2wd       |  3.3   |   1999   |     6      |  auto(l4)  |    f     |    16    |      22      |    r     |  minivan   |
|  42 |   dodge    |      caravan 2wd       |  3.3   |   2008   |     6      |  auto(l4)  |    f     |    17    |      24      |    r     |  minivan   |
|  43 |   dodge    |      caravan 2wd       |  3.3   |   2008   |     6      |  auto(l4)  |    f     |    17    |      24      |    r     |  minivan   |
|  44 |   dodge    |      caravan 2wd       |  3.3   |   2008   |     6      |  auto(l4)  |    f     |    11    |      17      |    e     |  minivan   |
|  45 |   dodge    |      caravan 2wd       |  3.8   |   1999   |     6      |  auto(l4)  |    f     |    15    |      22      |    r     |  minivan   |
|  46 |   dodge    |      caravan 2wd       |  3.8   |   1999   |     6      |  auto(l4)  |    f     |    15    |      21      |    r     |  minivan   |
|  47 |   dodge    |      caravan 2wd       |  3.8   |   2008   |     6      |  auto(l6)  |    f     |    16    |      23      |    r     |  minivan   |
|  48 |   dodge    |      caravan 2wd       |  4.0   |   2008   |     6      |  auto(l6)  |    f     |    16    |      23      |    r     |  minivan   |
|  49 |   dodge    |   dakota pickup 4wd    |  3.7   |   2008   |     6      | manual(m6) |    4     |    15    |      19      |    r     |   pickup   |
|  50 |   dodge    |   dakota pickup 4wd    |  3.7   |   2008   |     6      |  auto(l4)  |    4     |    14    |      18      |    r     |   pickup   |
|  51 |   dodge    |   dakota pickup 4wd    |  3.9   |   1999   |     6      |  auto(l4)  |    4     |    13    |      17      |    r     |   pickup   |
|  52 |   dodge    |   dakota pickup 4wd    |  3.9   |   1999   |     6      | manual(m5) |    4     |    14    |      17      |    r     |   pickup   |
|  53 |   dodge    |   dakota pickup 4wd    |  4.7   |   2008   |     8      |  auto(l5)  |    4     |    14    |      19      |    r     |   pickup   |
|  54 |   dodge    |   dakota pickup 4wd    |  4.7   |   2008   |     8      |  auto(l5)  |    4     |    14    |      19      |    r     |   pickup   |
|  55 |   dodge    |   dakota pickup 4wd    |  4.7   |   2008   |     8      |  auto(l5)  |    4     |    9     |      12      |    e     |   pickup   |
|  56 |   dodge    |   dakota pickup 4wd    |  5.2   |   1999   |     8      | manual(m5) |    4     |    11    |      17      |    r     |   pickup   |

 

# 평균연비 합격을 의미하는 '연비테스트'컬럼을 mpg 데이터에 추가
# 평균연비 20이상이면 '합격', 그렇지 않으면 '불합격'
# 도시연비, 고속도로 연비를 활용하여 평균연비 데이터 avg을 mpg 데이터에 추가
df['연비테스트'] = np.where((df['도시연비'] + df['고속도로연비']) / 2 >= 20, '합격', '불합격')
print_df(df.head(10))

> 결과값

<class 'pandas.core.frame.DataFrame'>
(10, 12)
+----+----------+------------+--------+----------+------------+------------+----------+----------+--------------+----------+------------+------------+
|    | 제조회사 |   모델명   | 배기량 | 생산년도 | 실린더개수 | 변속기종류 | 구동방식 | 도시연비 | 고속도로연비 | 연료종류 | 자동자종류 | 연비테스트 |
+----+----------+------------+--------+----------+------------+------------+----------+----------+--------------+----------+------------+------------+
| 1  |   audi   |     a4     |  1.8   |   1999   |     4      |  auto(l5)  |    f     |    18    |      29      |    p     |  compact   |    합격    |
| 2  |   audi   |     a4     |  1.8   |   1999   |     4      | manual(m5) |    f     |    21    |      29      |    p     |  compact   |    합격    |
| 3  |   audi   |     a4     |  2.0   |   2008   |     4      | manual(m6) |    f     |    20    |      31      |    p     |  compact   |    합격    |
| 4  |   audi   |     a4     |  2.0   |   2008   |     4      |  auto(av)  |    f     |    21    |      30      |    p     |  compact   |    합격    |
| 5  |   audi   |     a4     |  2.8   |   1999   |     6      |  auto(l5)  |    f     |    16    |      26      |    p     |  compact   |    합격    |
| 6  |   audi   |     a4     |  2.8   |   1999   |     6      | manual(m5) |    f     |    18    |      26      |    p     |  compact   |    합격    |
| 7  |   audi   |     a4     |  3.1   |   2008   |     6      |  auto(av)  |    f     |    18    |      27      |    p     |  compact   |    합격    |
| 8  |   audi   | a4 quattro |  1.8   |   1999   |     4      | manual(m5) |    4     |    18    |      26      |    p     |  compact   |    합격    |
| 9  |   audi   | a4 quattro |  1.8   |   1999   |     4      |  auto(l5)  |    4     |    16    |      25      |    p     |  compact   |    합격    |
| 10 |   audi   | a4 quattro |  2.0   |   2008   |     4      | manual(m6) |    4     |    20    |      28      |    p     |  compact   |    합격    |
+----+----------+------------+--------+----------+------------+------------+----------+----------+--------------+----------+------------+------------+

 

# 각 값별로 수량을 카운트하여 새로운 데이터 프레임 생성
count = df['연비테스트'].value_counts()
count_df = DataFrame(count)
print_df(count_df)

> 결과값

<class 'pandas.core.frame.DataFrame'>
(2, 1)
+--------+------------+
|        | 연비테스트 |
+--------+------------+
|  합격  |    128     |
| 불합격 |    106     |
+--------+------------+

 

6. 데이터 정제

# 결측치 여부 확인
empty_sum = df.isnull().sum()
print_df(empty_sum)

> 결과값

<class 'pandas.core.series.Series'>
제조회사      0
모델명       0
배기량       0
생산년도      0
실린더개수     0
변속기종류     0
구동방식      0
도시연비      0
고속도로연비    0
연료종류      0
자동자종류     0
연비테스트     0
dtype: int64

 

7. 데이터 시각화

# 그래프 만들기
plt.rcParams["font.family"] = 'NanumGothic'      # 한글 폰트 지정(나눔고딕)
plt.rcParams["font.size"] = 14                   # 그래프 폰트 사이즈(14)
plt.rcParams["figure.figsize"] = (10, 10)        # 그래프 사이즈(10 x 10)

# # 전체 컬럼에 대한 시각화
count_df['연비테스트'].plot.pie(autopct='%0.1f%%')  # autopct : 각 범주가 데이터에서 차지하는 비율
plt.title("연비테스트 합격 비율")                   # 그래프 제목
plt.savefig('mpg.png', dpi=200)                     # 그래프 저장
plt.show()                                          # 그래프 보기
plt.close()                                         # 그래프 종료

 

8. 시각화 결과값