t-test

t-test란?

df = pd.read_csv('../data/preprocessed.csv',index_col=0)
df_0 = df.loc[df['is_canceled']==0,['lead_time','adr','days_in_waiting_list','total_of_special_requests']].copy()
df_1 = df.loc[df['is_canceled']==1,['lead_time','adr','days_in_waiting_list','total_of_special_requests']].copy()
def t_test(x,y):
    lresult=stats.levene(x,y)
    print('LeveneResult(F):{:.3f}\np-value:{:.3f}'.format(lresult.statistic,lresult.pvalue))
    if round(lresult.pvalue,2) < 0.05:
        result = stats.ttest_ind(x,y,equal_var=False)
    else:
        result = stats.ttest_ind(x,y,equal_var=True)
    print('t-testResult(F):{:.3f}\np-value:{:.3f}'.format(result.statistic,result.pvalue))
col_lst = ['lead_time','adr','days_in_waiting_list','total_of_special_requests']
for i in col_lst:
    print(i)
    t_test(df_0[i],df_1[i])
lead_time
LeveneResult(F):3601.800
p-value:0.000
t-testResult(F):-99.075
p-value:0.000
adr
LeveneResult(F):105.482
p-value:0.000
t-testResult(F):-16.171
p-value:0.000
days_in_waiting_list
LeveneResult(F):351.568
p-value:0.000
t-testResult(F):-17.087
p-value:0.000
total_of_special_requests
LeveneResult(F):10696.542
p-value:0.000
t-testResult(F):88.890
p-value:0.000

lead_time,adr,days_in_waiting_list,total_of_special requests를 is_canceled에 여부에 따라 나눴을때
p-value가 0.05보다 작기 때문에 두 집단간의 평균이 다르다는 것을 알 수 있다.